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Preface to the Second Edition 


Undergraduate courses in mathematics are commonly of two types. On the one hand 
there are courses in subjects, such as linear algebra or real analysis, with which it is 
considered that every student of mathematics should be acquainted. On the other hand 
there are courses given by lecturers in their own areas of specialization, which are 
intended to serve as a preparation for research. There are, I believe, several reasons 
why students need more than this. 

First, although the vast extent of mathematics today makes it impossible for any 
individual to have a deep knowledge of more than a small part, it is important to have 
some understanding and appreciation of the work of others. Indeed the sometimes 
surprising interrelationships and analogies between different branches of mathematics 
are both the basis for many of its applications and the stimulus for further develop- 
ment. Secondly, different branches of mathematics appeal in different ways and require 
different talents. It is unlikely that all students at one university will have the same 
interests and aptitudes as their lecturers. Rather, they will only discover what their 
own interests and aptitudes are by being exposed to a broader range. Thirdly, many 
students of mathematics will become, not professional mathematicians, but scientists, 
engineers or schoolteachers. It is useful for them to have a clear understanding of the 
nature and extent of mathematics, and it is in the interests of mathematicians that there 
should be a body of people in the community who have this understanding. 

The present book attempts to provide such an understanding of the nature and 
extent of mathematics. The connecting theme is the theory of numbers, at first sight 
one of the most abstruse and irrelevant branches of mathematics. Yet by exploring 
its many connections with other branches, we may obtain a broad picture. The topics 
chosen are not trivial and demand some effort on the part of the reader. As Euclid 
already said, there is no royal road. In general I have concentrated attention on those 
hard-won results which illuminate a wide area. If I am accused of picking the eyes out 
of some subjects, I have no defence except to say “But what beautiful eyes!” 

The book is divided into two parts. Part A, which deals with elementary number 
theory, should be accessible to a first-year undergraduate. To provide a foundation for 
subsequent work, Chapter I contains the definitions and basic properties of various 
mathematical structures. However, the reader may simply skim through this chapter 
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and refer back to it later as required. Chapter V, on Hadamard’s determinant problem, 
shows that elementary number theory may have unexpected applications. 

Part B, which is more advanced, is intended to provide an undergraduate with some 
idea of the scope of mathematics today. The chapters in this part are largely indepen- 
dent, except that Chapter X depends on Chapter IX and Chapter XIII on Chapter XII. 

Although much of the content of the book is common to any introductory work 
on number theory, I wish to draw attention to the discussion here of quadratic fields 
and elliptic curves. These are quite special cases of algebraic number fields and alge- 
braic curves, and it may be asked why one should restrict attention to these special 
cases when the general cases are now well understood and may even be developed 
in parallel. My answers are as follows. First, to treat the general cases in full rigour 
requires a commitment of time which many will be unable to afford. Secondly, these 
special cases are those most commonly encountered and more constructive methods 
are available for them than for the general cases. There is yet another reason. Some- 
times in mathematics a generalization is so simple and far-reaching that the special 
case is more fully understood as an instance of the generalization. For the topics 
mentioned, however, the generalization is more complex and is, in my view, more 
fully understood as a development from the special case. 

At the end of each chapter of the book I have added a list of selected references, 
which will enable readers to travel further in their own chosen directions. Since the 
literature is voluminous, any such selection must be somewhat arbitrary, but I hope 
that mine may be found interesting and useful. 

The computer revolution has made possible calculations on a scale and with a 
speed undreamt of a century ago. One consequence has been a considerable increase 
in ‘experimental mathematics’ —the search for patterns. This book, on the other hand, 
is devoted to ‘theoretical mathematics’ —the explanation of patterns. I do not wish to 
conceal the fact that the former usually precedes the latter. Nor do I wish to conceal 
the fact that some of the results here have been proved by the greatest minds of the past 
only after years of labour, and that their proofs have later been improved and simplified 
by many other mathematicians. Once obtained, however, a good proof organizes and 
provides understanding for a mass of computational data. Often it also suggests further 
developments. 

The present book may indeed be viewed as a ‘treasury of proofs’. We concentrate 
attention on this aspect of mathematics, not only because it is a distinctive feature 
of the subject, but also because we consider its exposition is better suited to a book 
than to a blackboard or a computer screen. In keeping with this approach, the proofs 
themselves have been chosen with some care and I hope that a few may be of interest 
even to those who are no longer students. Proofs which depend on general principles 
have been given preference over proofs which offer no particular insight. 

Mathematics is a part of civilization and an achievement in which human beings 
may take some pride. It is not the possession of any one national, political or religious 
group and any attempt to make it so is ultimately destructive. At the present time 
there are strong pressures to make academic studies more ‘relevant’. At the same time, 
however, staff at some universities are assessed by ‘citation counts’ and people are 
paid for giving lectures on chaos, for example, that are demonstrably rubbish. 
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The theory of numbers provides ample evidence that topics pursued for their own 
intrinsic interest can later find significant applications. I do not contend that curiosity 
has been the only driving force. More mundane motives, such as ambition or the 
necessity of earning a living, have also played a role. It is also true that mathematics 
pursued for the sake of applications has been of benefit to subjects such as number 
theory; there is a two-way trade. However, it shows a dangerous ignorance of history 
and of human nature to promote utility at the expense of spirit. 

This book has its origin in a course of lectures which I gave at the Victoria 
University of Wellington, New Zealand, in 1975. The demands of my own research 
have hitherto prevented me from completing it, although I have continued to collect 
material. If it succeeds at all in conveying some idea of the power and beauty of math- 
ematics, the labour of writing it will have been well worthwhile. 

As with a previous book, I have to thank Helge Tverberg, who has read most of the 
manuscript and made many useful suggestions. 

The first Phalanger Press edition of this book appeared in 2002. A revised edition, 
which was reissued by Springer in 2006, contained a number of changes. I removed 
an error in the statement and proof of Proposition II.12 and filled a gap in the proof 
of Proposition HI.12. The statements of the Weil conjectures in Chapter IX and of a 
result of Heath-Brown in Chapter X were modified, following comments by J.-P. Serre. 
I also corrected a few misprints, made many small expository changes and expanded 
the index. 

In the present edition I have made some more expository changes and have 
added a few references at the end of some chapters to take account of recent de- 
velopments. For more detailed information the Internet has the advantage over a 
book. The reader is referred to the American Mathematical Society’s MathSciNet 
(www.ams.org/mathscinet) and to The Number Theory Web maintained by Keith 
Matthews (www.maths.uq.edu.au/~krmm/). 

I am grateful to Springer for undertaking the commercial publication of my book 
and hope you will be also. Many of those who have contributed to the production of 
this new softcover edition are unknown to me, but among those who are I wish to thank 
especially Alicia de los Reyes and my sons Nicholas and Philip. 


W.A. Coppel 
May, 2009 
Canberra, Australia 
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The Expanding Universe of Numbers 


For many people, numbers must seem to be the essence of mathematics. Number 
theory, which is the subject of this book, is primarily concerned with the properties 
of one particular type of number, the ‘whole numbers’ or integers. However, there 
are many other types, such as complex numbers and p-adic numbers. Somewhat sur- 
prisingly, a knowledge of these other types turns out to be necessary for any deeper 
understanding of the integers. 

In this introductory chapter we describe several such types (but defer the study of 
p-adic numbers to Chapter VI). To embark on number theory proper the reader may 
proceed to Chapter IT now and refer back to the present chapter, via the Index, only as 
occasion demands. 

When one studies the properties of various types of number, one becomes aware 
of formal similarities between different types. Instead of repeating the derivations of 
properties for each individual case, it is more economical — and sometimes actually 
clearer — to study their common algebraic structure. This algebraic structure may be 
shared by objects which one would not even consider as numbers. 

There is a pedagogic difficulty here. Usually a property is discovered in one context 
and only later is it realized that it has wider validity. It may be more digestible to 
prove a result in the context of number theory and then simply point out its wider 
range of validity. Since this is a book on number theory, and many properties were 
first discovered in this context, we feel free to adopt this approach. However, to make 
the statements of such generalizations intelligible, in the latter part of this chapter we 
describe several basic algebraic structures. We do not attempt to study these structures 
in depth, but restrict attention to the simplest properties which throw light on the work 
of later chapters. 
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The label ‘0’ given to this section may be interpreted to stand for ‘Optional’. We collect 
here some definitions of a logical nature which have become part of the common lan- 
guage of mathematics. Those who are not already familiar with this language, and who 
are repelled by its abstraction, should consult this section only when the need arises. 
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We will not formally define a set, but will simply say that it is a collection of 
objects, which are called its elements. We write a € A if a is an element of the set A 
anda ¢ A if it is not. 

A set may be specified by listing its elements. For example, A = {a, b, c} is the set 
whose elements are a, b, c. A set may also be specified by characterizing its elements. 
For example, 


A={x eR: x? <2} 


is the set of all real numbers x such that x? < 2. 
If two sets A, B have precisely the same elements, we say that they are equal and 
write A = B. (If A and B are not equal, we write A ~ B.) For example, 


{x ER: x* =1}={1,-1}. 


Just as it is convenient to admit 0 as a number, so it is convenient to admit the 
empty set 0, which has no elements, as a set. 

If every element of a set A is also an element of a set B we say that A is a subset 
of B, or that A is included in B, or that B contains A, and we write A C B. We say 
that A is a proper subset of B, and write A C B,if AC BandAF B. 

Thus 9 C A for every set A and c A if A ¥ 9. Set inclusion has the following 
obvious properties: 


(i) AC A; 
(ii) if AC Band B CA, then A= B; 
(iii) ff AC Band BCC,then ACC. 


For any sets A, B, the set whose elements are the elements of A or B (or both) is 
called the union or ‘join’ of A and B and is denoted by A U B: 


AUB={x:x €Aorx € B}. 


The set whose elements are the common elements of A and B is called the intersection 
or ‘meet’ of A and B and is denoted by AM B: 


ANB={x:x eAandx eé B}. 


If AM B = Q, the sets A and B are said to be disjoint. 


Fig. 1. Union and Intersection. 
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It is easily seen that union and intersection have the following algebraic properties: 


AUA=A, ANA=A, 

AUB=BUA, ANB=BNA, 
(AUB)UC=AU(BUC), (ANB)NC=AN(BNOC), 
(AUB)NC=(ANC)U(BNC), (ANB)UC=(AUC)N(BUC). 

Set inclusion could have been defined in terms of either union or intersection, since 
A C B isthe same as AU B = B and also the same as AM B= A. 


For any sets A, B, the set of all elements of B which are not also elements of A is 
called the difference of B from A and is denoted by B\ A: 


B\A = {x: x € Bandx ¢€ A}. 
It is easily seen that 


C\(A U B) = (C\A)N (C\B), 
C\(AN B) = (C\A) U(C\B). 


An important special case is where all sets under consideration are subsets of a 
given universal set X. For any A C X, we have 


PUA=A, ONA=ZB, 
XUA=X, XNA=A. 


The set X\A is said to be the complement of A (in X) and may be denoted by A° for 
fixed X. Evidently 


go =X, X=, 
AVUA =k, ANATHV, 
(A°)° = A. 


By taking C = X in the previous relations for differences, we obtain “De Morgan’s 
laws’: 


(AU BY’ = ASN B®, (AN B)* = ACUBS, 


Since AM B = (A‘ U B°)’, set intersection can be defined in terms of unions and 
complements. Alternatively, since AU B = (AS NM B°)°, set union can be defined in 
terms of intersections and complements. 

For any sets A, B, the set of all ordered pairs (a, b) witha € A andb € B is called 
the (Cartesian) product of A by B and is denoted by A x B. 

Similarly one can define the product of more than two sets. We mention only one 
special case. For any positive integer n, we write A” instead of A x --- x A for the set 
of all (ordered) n-tuples (a,,..., dn) witha; €¢ A (1 < j <n). We calla; the j-th 
coordinate of the n-tuple. 

A binary relation on a set A is just a subset R of the product set A x A. For any 
a,b € A, we write aRb if (a, b) € R. A binary relation R on a set A is said to be 
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reflexive if aRa for every a € A; 
symmetric if bRa whenever a Rb; 
transitive if aRc whenever aRb and bRc. 


It is said to be an equivalence relation if it is reflexive, symmetric and transitive. 

If R is an equivalence relation on a set A anda é A, the equivalence class Rg 
of a is the set of all x € A such that x Ra. Since R is reflexive, a € Rg. Since R is 
symmetric, b € Ry implies a € Rp. Since R is transitive, b € Rg implies Rp C Rg. It 
follows that, for all a, b € A, either Ry = Rp or Rg OM Rp = G. 

A partition @ of a set A is a collection of nonempty subsets of A such that each 
element of A is an element of exactly one of the subsets in @. 

Thus the distinct equivalence classes corresponding to a given equivalence relation 
on a set A form a partition of A. It is not difficult to see that, conversely, if @ is a 
partition of A, then an equivalence relation R is defined on A by taking R to be the 
set of all (a,b) € A x A for which a and b are elements of the same subset in the 
collection @. 

Let A and B be nonempty sets. A mapping f of A into B is a subset of A x B with 
the property that, for each a € A, there is a unique b € B such that (a,b) € f. We 
write f(a) = bif (a,b) € f, and say that b is the image of a under f or that b is the 
value of f at a. We express that f is a mapping of A into B by writing f: A > B 
and we put 


f(A) = (f(a): a € A}. 


The term function is often used instead of ‘mapping’, especially when A and B are 
sets of real or complex numbers, and ‘mapping’ itself is often abbreviated to map. 

If f is a mapping of A into B, and if A’ is a nonempty subset of A, then the 
restriction of f to A’ is the set of all (a,b) € f witha € A’. 

The identity map ia of a nonempty set A into itself is the set of all ordered pairs 
(a,a) witha € A. 

If f is a mapping of A into B, and g a mapping of B into C, then the composite 
mapping g o f of A into C is the set of all ordered pairs (a, c), where c = g(b) and 
b = f(a). Composition of mappings is associative, 1.e. if h is a mapping of C into D, 
then 


(hog)o f=ho(go f). 


The identity map has the obvious properties f oi4 = f andigo f = f. 

Let A, B be nonempty sets and f: A > B amapping of A into B. The mapping 
f is said to be ‘one-to-one’ or injective if, for each b € B, there exists at most one 
a € A such that (a,b) € f. The mapping f is said to be ‘onto’ or surjective if, for 
each b € B, there exists at least one a € A such that (a,b) € f. If f is both injective 
and surjective, then it is said to be bijective or a ‘one-to-one correspondence’. The 
nouns injection, surjection and bijection are also used instead of the corresponding 
adjectives. 

It is not difficult to see that f is injective if and only if there exists a mapping 
g: B > A such that g o f = ig, and surjective if and only if there exists a mapping 
h: B > A such that f oh = ig. Furthermore, if f is bijective, then g and h are 
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unique and equal. Thus, for any bijective map f: A — B, there is a unique inverse 
map f~!: B > Asuch that f—~!o f =i, and f o f~! = ig. 

If f: A— Band g: B > C are both bijective maps, then g o f: A > C is also 
bijective and 


Col =F oe. 


1 Natural Numbers 


The natural numbers are the numbers usually denoted by 1, 2,3,4,5,.... However, 
other notations are also used, e.g. for the chapters of this book. Although one notation 
may have considerable practical advantages over another, it is the properties of the 
natural numbers which are basic. 

The following system of axioms for the natural numbers was essentially given by 
Dedekind (1888), although it is usually attributed to Peano (1889): 

The natural numbers are the elements of a set N, with a distinguished element 1 
(one) and map S: N > N, such that 


(N1) S is injective, i.e. if msn € Nandm #n, then S(m) 4 S(n); 
(N2) 1 ¢ SN); 
(N3) if MCN, 1 € M and S(M) C M, thenM =N. 


The element S(n) of N is called the successor of n. The axioms are satisfied by 
{1,2,3,...} if we take S(1) to be the element immediately following the element n. 

It follows readily from the axioms that | is the only element of N which is not in 
S(N). For, if M = S(N) U {1}, then M CN, 1 € M and S(M) C M. Hence, by (N3), 
M=N. 

It also follows from the axioms that S(n) ~ n for every n € N. For let M be the 
set of all n € N such that S(n) 4 n. By (N2), 1 € M.Ifn € M andn’ = S(n) then, by 
(N1), S(n') £n’. Thus S(M) C M and hence, by (N3), M =N. 

The axioms (N1)-(N3) actually determine N up to ‘isomorphism’. We will deduce 
this as a corollary of the following general recursion theorem: 


Proposition 1 Given a set A, an element a, of A and a map T: A > A, there exists 
exactly one map 9: N > A such that g(1) = a, and 


o(S(n)) =To(n)_ foreveryn EN. 


Proof We show first that there is at most one map with the required properties. Let 91 
and v2 be two such maps, and let M be the set of all n € N such that 


g\(n) = o2(n). 
Evidently 1 ¢ M.Ifn € M, then also S(n) € M, since 
gi(S(n)) = Tgi(n) = Te2(n) = 92(S(n)). 


Hence, by (N3), M = N. That is, 9; = 92. 
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We now show that there exists such a map g. Let @ be the collection of all 
subsets C of N x A such that (1,a;) € C and such that if (n,a) € C, then also 
(S(n), T(a)) € C. The collection @ is not empty, since it contains N x A. Moreover, 
since every set in @ contains (1, a1), the intersection D of all sets C € @ is not empty. 
It is easily seen that actually D € @. By its definition, however, no proper subset of 
Disin@. 

Let M be the set of all n € N such that (n,a) € D for exactly one a € A and, 
for any n € M, define g(n) to be the unique a € A such that (n,a) € D. If M=N, 
then g(1) = a; and g(S(n)) = Teg(n) for all n € N. Thus we need only show that 
M = N. As usual, we do this by showing that 1 € M and thatn € M implies 
S(n) eM. 

We have (1,a,) € D. Assume (1,a’) € D for some a’ 4 qi. If D!’ = 
D\{(1, a’)}, then (1,a,) € D’. Moreover, if (n,a) € D’ then (S(n), T(a)) € D’, 
since (S(n), T(a)) € D and (S(n), T(a)) # (1,a’). Hence D’ € @. But this is a 
contradiction, since D’ is a proper subset of D. We conclude that 1 € M. 

Suppose now that n € M and let a be the unique element of A such that (n, a) € D. 
Then (S(n), T(a)) € D, since D € @. Assume that (S(n),a”) € D for some 
a” # T(a) and put D” = D\{(S(n), a’”)}. Then (S(n), T(a)) € D” and (1, a1) € D”. 
For any (m,b) € D” we have (S(m), T(b)) € D. If (S(m), T(b)) = (S(n), a”), 
then S(m) = S(n) and T(b) = a” # T(a), which implies m = n and b ¥ a. Thus 
D contains both (n, b) and (n, a), which contradicts n € M. Hence (S(m), T(b)) 4 
(S(n), a”), and so (S(m), T(b)) € D”. But then D” € @ , which is also a contradic- 
tion, since D” is a proper subset of D. We conclude that S(n) € M. 


Corollary 2 [f the axioms (N1)-(N3) are also satisfied by a set N’ wth element 1' and 
map S': N’ > N’, then there exists a bijective map 9 of N onto N’ such that (1) = V 
and 


g(S(n)) = S'o(n)_ foreverynéeN. 


Proof By taking A = N’, a; = I’ and T = S’ in Proposition 1, we see that there 
exists a unique map g: N > N’ such that g(1) = I’ and 


p(S(n)) = S'p(n)_ foreveryn EN. 


By interchanging N and N’, we see also that there exists a unique map wy: N’ — N 
such that y(1’) = 1 and 


w(S'(n')) = Sy(n') for every n’ € N’. 


The composite map y = y og of N into N has the properties (1) = 1 and y(S(n)) = 
Sx (n) for every n € N. But, by Proposition | again, y is uniquely determined by these 
properties. Hence y o g is the identity map on N, and similarly g o wy is the identity 
map on N’. Consequently 9 is a bijection. 


We can also use Proposition | to define addition and multiplication of natural num- 
bers. By Proposition 1, for each m € N there exists a unique map sy: N — N such 
that 


Sm(1) = S(m), — S5m(S(2)) = Ssm(n)_ foreveryn € N. 
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We define the sum of m and n to be 
m+n = Syy,(n). 
It is not difficult to deduce from this definition and the axioms (N1)—(N3) the usual 


rules for addition: for alla, b,c € N, 


(Al) ifa+c=b-+c,thena=b; (cancellation law) 
(A2) a+b=b+a; (commutative law) 
(A3) (a+b)+c=at+(b+c). (associative law) 

By way of example, we prove the cancellation law. Let M be the set of all c e N 
such thata +c = b+ c only if a = b. Then 1 € M, since sg(1) = sp(1) implies 
S(a) = S(b) and hence a = b. Suppose cc € M. Ifa+S(c) = b+ S(c), i.e. sq(S(c)) = 
sp(S(c)), then Ssg(c) = Ssp(c) and hence, by (N1), sg(c) = sp(c). Since c € M, this 
implies a = b. Thus also S(c) € M. Hence, by (N3), M =N. 

We now show that 


m+nn_ forallm,neéeN. 


For a given m € N, let M be the set of all n € N such thatm +n #n. Then 1 € M 
since, by (N2), s_(1) = S(m) 4 1. Ifn € M, then s,,(n) 4 n and hence, by (N1), 


Sm(S(n)) = Ssm(n) F S(n). 


Hence, by (N3), M = N. 
By Proposition | again, for each m € N there exists a unique map py: N > N 
such that 


Pn (Ql) =m, 
Pm(S(n)) = 5m(Pm(n)) for every n € N. 


We define the product of m and n to be 
M-N = Pm(n). 


From this definition and the axioms (N1)-(N3) we may similarly deduce the usual 
rules for multiplication: for alla, b,c € N, 


(M1) ifa-c=b-c,thena=b; (cancellation law) 


(M2) a-b=b-a; (commutative law) 
(M3) (a-b)-c=a-(b-c); (associative law) 
(M4) a-l=a. (identity element) 


Furthermore, addition and multiplication are connected by 
(AM]1) a: (b+ c) = (a-b)+(a-c). (distributive law) 


As customary, we will often omit the dot when writing products and we will give 
multiplication precedence over addition. With these conventions the distributive law 
becomes simply 


a(b+c) =ab+ac. 


8 I The Expanding Universe of Numbers 


We show next how a relation of order may be defined on the set N. For any 
m,n &€ N, we say that m is less than n, and write m <n, if 


m+m'=n_ forsomem’ EN. 


Evidently m < S(m) for every m € N, since S(m) = m + 1. Also, if m <n, then 
either S(m) =n or S(m) <n. For suppose m + m’ =n. If m’ = 1, then S(m) = n. If 
m' #1, then m! = m” + 1 for some m” € N and 


S(m) +m" =(m+1)4+m"=m+4+m")=m4m =n. 


Again, if n ~ 1, then 1 < n, since the set consisting of | and all n € N such that 
1 <ncontains | and contains $(7) if it contains n. 

It will now be shown that the relation ‘<’ induces a total order on N, which is 
compatible with both addition and multiplication: for all a,b,c € N, 


(O1) if a < bandb <c, thena < c; (transitive law) 


(O2) one and only one of the following alternatives holds: 
a<b,a=b,b <a; _ (law of trichotomy) 


(O03) a+c <b+c ifand only if a <b; 
(O4) ac < bc if and only if a < b. 


The relation (O1) follows directly from the associative law for addition. We now 
prove (O2). If a < b then, for some a’ € N, 


b=at+d =a +aFa. 


Together with (O1), this shows that at most one of the three alternatives in (O2) holds. 

For a givena € N, let M be the set of all b € N such that at least one of the three 
alternatives in (O02) holds. Then 1 € M, since | < aif a ¥ 1. Suppose now that 
be M.Ifa=b,thena < S(b). Ifa < b, then againa < S(b), by (Ol). Ifb < a, 
then either S(b) = a or S(b) < a. Hence also S(b) € M. Consequently, by (N3), 
M =N. This completes the proof of (O02). 

It follows from the associative and commutative laws for addition that, if a < b, 
then a +c < b-+c. On the other hand, by using also the cancellation law we see that 
ifa+c<b+c,thena <b. 

It follows from the distributive law that, if a < b, thenac < bc. Finally, suppose 
ac < bc. Thena ¥ b and hence, by (O2), either a < borb <a. Since b < a would 
imply be < ac, by what we have just proved, we must actually havea < b. 

The law of trichotomy (O2) implies that, for given m,n € N, the equation 


m+x=n 


has a solution x € N only ifm <n. 

As customary, we write a < b to denote either a < bora = b. Also, it is 
sometimes convenient to write b > a instead of a < b, and b > a instead of a < b. 

A subset M of N is said to have a least element m' if m' € M and m’ < m for 
every m € M. The least element m’ is uniquely determined, if it exists, by (02). By 
what we have already proved, | is the least element of N. 
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Proposition 3 Any nonempty subset M of N has a least element. 


Proof Assume that some nonempty subset M of N does not have a least element. 
Then | ¢ M, since | is the least element of N. Let L be the set of all / € N such that 
1 < m forevery m € M. Then L and M are disjoint and 1 € L.If/ € L, then S(/) < m 
for every m € M. Since M does not have a least element, it follows that S(/) @ M. 
Thus S(/) < m for every m € M, and so S(/) € L. Hence, by (N3), L = N. Since 
LOM =, this is a contradiction. 


The method of proof by induction is a direct consequence of the axioms defining N. 
Suppose that with each n € N there is associated a proposition P,,. To show that P, is 
true for every n € N, we need only show that P, is true and that P,4+1 is true if P, is 
true. 

Proposition 3 provides an alternative approach. To show that P, is true for every 
n &€ N, we need only show that if P,, is false for some m, then P; is false for some 
1 < m. For then the set of all nm € N for which P, is false has no least element and 
consequently is empty. 

For any n € N, we denote by [, the set of all m € N such that m < n. Thus 
I, = {1} and S(v) €@ I,. It is easily seen that 


Ts(n) =1,U {S(n)}. 


Also, for any p € Is n), there exists a bijective map fp of I, onto Is(,)\{p}. For, if 
p = S(n) we can take f, to be the identity map on J, and if p € J, we can take f, to 
be the map defined by 


fp(p) = Sn), fp(m) =m _ ifm € In\{p}. 


Proposition 4 For any m,n € N, ifamap f : Im In is injective and f Im) 4 In, 
thenm <n. 


Proof The result certainly holds when m = 1, since 1 = {1}. Let M be the set of 
all m € N for which the result holds. We need only show that if m € M, then also 
S(m) € M. 

Let f: Is(m) — In be an injective map such that f(Us(m)) 4 Jn and choose 
Pp € In\ f Us(m)). The restriction g of f to Jj, is also injective and gn) ¥ In. Since 
m € M, it follows that m < n. Assume S(m) = n. Then there exists a bijective map 
8p Of Iscn)\{p} onto In. The composite map h = gp o f maps Js(m) into [jy and is 
injective. Since m € M, we must have h(/,) = Im. But, since h(S(m)) € In and h 
is injective, this is a contradiction. Hence S(m) < n and, since this holds for every 
f, S(m) eM. 


Proposition 5 For anym,n €N, ifamap f: Im > In is not injective and f (Im) = 
I,, thenm > n. 


Proof The result holds vacuously when m = 1, since any map f: 1; — I, is injec- 
tive. Let M be the set of all m € N for which the result holds. We need only show that 
ifm € M, thenalso S(m) € M. 
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Let f: Is(m) > In be amap such that f(/s(m)) = In which is not injective. Then 
there exist p,q € Isqm) with p ¢ q and f(p) = f(q). We may choose the notation 
so that g € Im. If fp is a bijective map of I, onto ['s(m)\{p}, then the composite map 
h = fo fp maps In onto J,. If it is not injective then m > n, since m € M, and 
hence also S(m) > n. If h is injective, then it is bijective and has a bijective inverse 
h-!: I, SD Im. Since h7! (Zn) is a proper subset of [5(m), it follows from Proposition 4 
that n < S(m). Hence S(m) € M. 


Propositions 4 and 5 immediately imply 


Corollary 6 For anyn € N, amap f: I, > In is injective if and only if it is surjec- 
tive. 


Corollary 7 [famap f : In > In is bijective, thenm =n. 


Proof By Proposition 4,m < S(n), i.e.m <n. Replacing f by f—!, we obtain in the 
same way n < m. Hencem =n. 


A set E is said to be finite if there exists a bijective map f: E — I, for some 
n €N. Then n is uniquely determined, by Corollary 7. We call it the cardinality of E 
and denote it by #(£). 

It is readily shown that if E is a finite set and F a proper subset of E, then F is 
also finite and #(F) < #(£). Again, if E and F are disjoint finite sets, then their union 
E U Fis also finite and #(E U F) = #(£) + #(F). Furthermore, for any finite sets E 
and F, the product set E x F is also finite and #(E x F) = #(E) - #(F). 

Corollary 6 implies that, for any finite set E,a map f: E — E is injective if and 
only if it is surjective. This is a precise statement of the so-called pigeonhole principle. 

A set E is said to be countably infinite if there exists a bijective map f: E > N. 
Any countably infinite set may be bijectively mapped onto a proper subset F,, since 
N is bijectively mapped onto a proper subset by the successor map S. Thus a map 
f: E — E of an infinite set E may be injective, but not surjective. It may also be 
surjective, but not injective; an example is the map f: N > N defined by f(1) = 1 
and, forn 4 1, f(n) = mif S(m) =n. 


2 Integers and Rational Numbers 


The concept of number will now be extended. The natural numbers 1, 2, 3, ... suffice 
for counting purposes, but for bank balance purposes we require the larger set ..., —2, 
—1,0,1,2,... of integers. (From this point of view, —2 is not so ‘unnatural’.) An 


important reason for extending the concept of number is the greater freedom it gives 
us. In the realm of natural numbers the equation a + x = b has a solution if and only 
if b > a; in the extended realm of integers it will always have a solution. 

Rather than introduce a new set of axioms for the integers, we will define them in 
terms of natural numbers. Intuitively, an integer is the difference m — n of two natural 
numbers m, n, with addition and multiplication defined by 


(m—n)+(p—q)=(m+ p)-(n+4Qq), 
(m —n)-(p— 4) = (mp + nq) — (mq + np). 
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However, two other natural numbers m’, n’ may have the same difference as m,n, and 
anyway what does m — n mean if m < n? To make things precise, we proceed in the 
following way. 

Consider the set N x N of all ordered pairs of natural numbers. For any two such 
ordered pairs, (m,n) and (m’, n’), we write 


(m,n) ~ (m,n) ifm+n' =m’ +n. 
We will show that this is an equivalence relation. It follows at once from the definition 
that (m,n) ~ (m,n) (reflexive law) and that (m,n) ~ (m’',n’) implies (m’,n’) ~ 
(m,n) (symmetric law). It remains to prove the transitive law: 


(m,n) ~ (m',n’) and (m’,n’') ~ (mm, n”) imply (m,n) ~ (m",n"). 


This follows from the commutative, associative and cancellation laws for addition 
in N. For we have 


m+n’ =m'+n, m+n" =m’ +n’, 
and hence 
(m+n')+n" = (mM! +n) +n" =(m'4+n)4+n=(m" +n’) +n. 
Thus 
(m+n) +n =(m" +n) +n’, 


and som +n" =m" +n. 

The equivalence class containing (1, 1) evidently consists of all pairs (m,n) with 
m=n, 

We define an integer to be an equivalence class of ordered pairs of natural numbers 
and, as is now customary, we denote the set of all integers by Z. 

Addition of integers is defined componentwise: 


(m,n) + (p,q) = (m+ p,n+4q). 


To justify this definition we must show that it does not depend on the choice of repre- 
sentatives within an equivalence class, i.e. that 


(m,n) ~ (m',n’) and (p,q) ~ (p’, g’) imply (m + p,n+q) ~ (m' 4+ p',n'+q'). 
However, if 

m+n=m'+n, p+q'=p'+4q, 
then 


(m+ p)+ (n'+q')=(m+n')+ (p+q’) 
= (m' +n) + (p'+q) = (m' + p') + (n+). 
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It follows at once from the corresponding properties of natural numbers that, also in 
Z, addition satisfies the commutative law (A2) and the associative law (A3). Moreover, 
the equivalence class 0 (zero) containing (1,1) is an identity element for addition: 


(A4) a + 0 =a for every a. 


Furthermore, the equivalence class containing (n,m) is an additive inverse for the 
equivalence containing (m, n): 


(AS) for each a, there exists — a such that a + (—a) = 0. 
From these properties we can now obtain 


Proposition 8 For all a, b € Z, the equation a + x = b has a unique solution x € Z. 


Proof It is clear that x = (—a) + b is a solution. Moreover, this solution is unique, 
since if a +x = a+ x’ then, by adding —a to both sides, we obtain x = x’. 


Proposition 8 shows that the cancellation law (A1) is a consequence of (A2)—(A5). 
It also immediately implies 


Corollary 9 For eacha € Z, 0 is the only element such that a+0 = a, —a is uniquely 
determined by a, and a = —(—a). 


As usual, we will henceforth write b — a instead of b + (—a). 
Multiplication of integers is defined by 


(m,n) - (p,q) = (mp +ng,mq +np). 


To justify this definition we must show that (m,n) ~ (m’,n’) and (p,q) ~ (p’,q’) 
imply 


(mp + ngq,mq + np) ~ (m'p! +n'q',m'q' +n'p’). 

From m +n! = m' +n, by multiplying by p and q we obtain 

mp +n'p=m'p+np, 

m'q +nq =mgq+ n'q, 
and from p + q/ = p’ + q, by multiplying by m’ and n’ we obtain 

m' p+m'q' =m'p'+m'q, 

np! + n'q = n'p + n'q'. 

Adding these four equations and cancelling the terms common to both sides, we get 
(mp + nq) + (m'q' +n'p’) = ('p' +n'q') + (mq + np), 


as required. 

It is easily verified that, also in Z, multiplication satisfies the commutative law 
(M2) and the associative law (M3). Moreover, the distributive law (AM1) holds and, 
if 1 is the equivalence class containing (1 + 1, 1), then (MA) also holds. (In prac- 
tice it does not cause confusion to denote identity elements of N and Z by the same 
symbol.) 
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Proposition 10 For everya € Z,a-0=0. 
Proof We have 


a-0=a-(0+0)=a-0+a-0. 


Adding —(a - 0) to both sides, we obtain the result. 


Proposition 10 could also have been derived directly from the definitions, but we 
prefer to view it as a consequence of the properties which have been labelled. 


Corollary 11 For alla, b € Z, 
a(—b) = —(ab), (—a)(—b) = ab. 
Proof The first relation follows from 


ab+a(—b) =a-0=0, 


and the second relation follows from the first, since c = —(—c). 


By the definitions of 0 and | we also have 
(AM2) 1 40. 


(In fact 1 = 0 would imply a = 0 for every a, sincea- 1 =a anda-0=0.) 

We will say that an integer a is positive if it is represented by an ordered pair 
(m,n) with n < m. This definition does not depend on the choice of representative. 
For ifn <mandm+n' =m’ +n,thenm-+n’ < m'+mandhencen’ < m’. 

We will denote by P the set of all positive integers. The law of trichotomy (O2) 
for natural numbers immediately implies 


(P1) for every a, one and only one of the following alternatives holds: 
aeéP, a=0, —-aeP. 


We say that an integer is negative if it has the form —a, where a € P, and we 
denote by —P the set of all negative integers. Since a = —(—a), (P1) says that Z is 
the disjoint union of the sets P, {0} and —P. 

From the property (O3) of natural numbers we immediately obtain 


(P2) ifae Pandbe P,thena+beP. 
Furthermore, we have 
(P3) ifae Pandbe P,thena-be P. 


To prove this we need only show that if m,n, p, g are natural numbers such that n < m 
and gq < p, then 


mq +np <mp-+nq. 


Since qg < p, there exists a natural number gq’ such that g+q’ = p. Butthennq’ < mq’, 
since n < m, and hence 


mg +np =(m+n)q +nq’ < (m+n)q+mq' =mp+nq. 
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We may write (P2) and (P3) symbolically in the form 
P+PCP, P-PCP. 
We now show that there are no divisors of zero in Z: 
Proposition 12 [fa 4 0 andb £ 0, then ab £ 0. 
Proof By (P1), either a or —a is positive, and either b or —b is positive. If a € P and 
b € P then ab € P, by (P3), and hence ab ¥ 0, by (P1). If a € P and —b € P, then 
a(—b) € P. Hence ab = —(a(—b)) € —P and ab ¥ O. Similarly if —a € P 


and b € P. Finally, if -a € P and —b é€ P, then ab = (—a)(—b) € P and 
again ab # 0. 


The proof of Proposition 12 also shows that any nonzero square is positive: 
Proposition 13 [fa 4 0, thena? := aa é P. 


It follows that 1 € P, since 1 ~ 0 and =1. 
The set P of positive integers induces an order relation in Z. Write 


a<b ifb—-aeP, 


so that a € P if and only if 0 < a. From this definition and the properties of P it 
follows that the order properties (O1)—(O3) hold also in Z, and that (04) holds in the 
modified form: 


(O4) if 0 < c, thenac < bc if and only if a < b. 


We now show that we can represent any a € Z in the form a = b — cc, where 
b,c € P. In fact, if a = 0, we can take b = 1 andc = 1; if a € P, we can take 
b=a+landc=1;andif—a e€ P, wecantakeb = 1 andc=1-—-a. 

An element a of Z is said to be a lower bound for a subset X of Z if a < x for every 
x € X. Proposition 3 immediately implies that if a subset of Z has a lower bound, then 
it has a least element. 

For any n € N, let n’ be the integer represented by (n + 1, 1). Thenn’ € P. We 
are going to study the map n — n’ of N into P. The map is injective, since n’ = m’ 
implies n = m. It is also surjective, since if a € P is represented by (m,n), where 
n <™m, then it is also represented by (p + 1, 1), where p € N satisfies n + p = m. It 
is easily verified that the map preserves sums and products: 


(m+n) =m +n’, (mn) =m'n'. 
Since 1’ = 1, it follows that S(n)/ = n’ + 1. Furthermore, we have 
m <n’ ifandonlyifm <n. 


Thus the map n — n’ establishes an ‘isomorphism’ of N with P. In other 
words, P is a copy of N situated within Z. By identifying n with n’, we may 
regard N itself as a subset of Z (and stop talking about P). Then ‘natural num- 
ber’ is the same as ‘positive integer’ and any integer is the difference of two natural 
numbers. 

Number theory, in its most basic form, is the study of the properties of the set Z of 
integers. It will be considered in some detail in later chapters of this book, but to relieve 
the abstraction of the preceding discussion we consider here the division algorithm: 
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Proposition 14 For any integers a, b with a > 0, there exist unique integers q,r such 
that 


b=qa+r, O<r<a. 
Proof We consider first uniqueness. Suppose 
gatr=qdatr', O<nrr' <a. 
Ifr <r’, then from 
(q-q)a=r'—r, 


we obtain first g > q’ and then r’ —r > a, which is a contradiction. If r’ < r, we 
obtain a contradiction similarly. Hence r = r’, which implies g = q’. 

We consider next existence. Let S be the set of all integers y > 0 which can be 
represented in the form y = b — xa for some x € Z. The set S is not empty, since it 
contains b — O0if b > Oand b — ba if b < 0. Hence S contains a least element r. Then 
b=qa-+r, where g,r € Zandr > 0. Sincer —a = b — (g + 1)a and r is the least 
element in S, we must also haver < a. 


The concept of number will now be further extended to include ‘fractions’ or 
‘rational numbers’. For measuring lengths the integers do not suffice, since the length 
of a given segment may not be an exact multiple of the chosen unit of length. Similarly 
for measuring weights, if we find that three identical coins balance five of the chosen 
unit weights, then we ascribe to each coin the weight 5/3. In the realm of integers the 
equation ax = b frequently has no solution; in the extended realm of rational numbers 
it will always have a solution if a ¢ 0. 

Intuitively, a rational number is the ratio or ‘quotient’ a/b of two integers a, b, 
where b ¥ 0, with addition and multiplication defined by 


a/b+c/d = (ad + cb)/bd, 
a/b-c/d =ac/bd. 


However, two other integers a’, b’ may have the same ratio as a, b, and anyway what 
does a/b mean? To make things precise, we proceed in much the same way as before. 

Put Z* = Z\ {0} and consider the set Z x Z™ of all ordered pairs (a, b) witha € Z 
and b € Z*. For any two such ordered pairs, (a, b) and (a’, b’), we write 


(a,b) ~ (a’,b’) if ab! =a'b. 


To show that this is an equivalence relation it is again enough to verify that (a, b) ~ 
(a’, b') and (a’,b’) ~ (a’,b”) imply (a,b) ~ (a’,b”). The same calculation as 
before, with addition replaced by multiplication, shows that (ab”)b’ = (a"b)b’. Since 
b' £$ 0, it follows that ab” = a"b. 

The equivalence class containing (0, 1) evidently consists of all pairs (0, b) with 
b # 0, and the equivalence class containing (1, 1) consists of all pairs (b, b) with 
b#0. 

We define a rational number to be an equivalence class of elements of Z x Z™ and, 
as is now customary, we denote the set of all rational numbers by Q. 
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Addition of rational numbers is defined by 
(a, b) + (c,d) = (ad + cb, bd), 
where bd # 0 since b £ O andd # 0. To justify the definition we must show that 
(a, b) ~ (a’, b') and (c, d) ~ (c’, d’) imply (ad 4+ cb, bd) ~ (a'd' + c'b', b'd’). 
But if ab’! = a’b and cd’ = c’d, then 


(ad + cb)(b'd’) = (ab’)(dd’) + (cd’)(bb’) 
= (a'b)(dd’) + (c'd)(bb’) = (a'd’ + c'b’)(bd). 


It is easily verified that, also in Q, addition satisfies the commutative law (A2) 
and the associative law (A3). Moreover (A4) and (AS) also hold, the equivalence class 
0 containing (0, 1) being an identity element for addition and the equivalence class 
containing (—b, c) being the additive inverse of the equivalence class containing (0, c). 

Multiplication of rational numbers is defined componentwise: 


(a, b)- (c,d) = (ac, bd). 
To justify the definition we must show that 
(a, b) ~ (a’, b’) and (c, d) ~ (c’, d’) imply (ac, bd) ~ (a'c’, b'd’). 
But if ab’! = a’b and cd’ = c’d, then 
(ac)(b'd’) = (ab')(cd') = (a'b)(c'd) = (a'c') (ba). 


It is easily verified that, also in Q, multiplication satisfies the commutative law 
(M2) and the associative law (M3). Moreover (MA) also holds, the equivalence class | 
containing (1, 1) being an identity element for multiplication. Furthermore, addition 
and multiplication are connected by the distributive law (AM1), and (AM2) also holds 
since (0, 1) is not equivalent to (1, 1). 

Unlike the situation for Z, however, every nonzero element of Q has a multiplica- 
tive inverse: 


(M5) for each a ¢ 0, there exists a~! such that aa“! = 1. 


In fact, if a is represented by (b, c), then a~' is represented by (c, b). 

It follows that, for all a,b € Q with a ¥ 0, the equation ax = b has a unique 
solution x € Q, namely x = a~'b. Hence, if a 4 0, then | is the only solution of 
' is uniquely determined by a, anda = (a7!)7!. 

We will say that a rational number a is positive if it is represented by an ordered 
pair (b, c) of integers for which bc > 0. This definition does not depend on the choice 
of representative. For suppose 0 < bc and bc’ = b'c. Then bc’ # 0, since b 4 0 and 
c’ ~ 0, and hence 0 < (bc’)*. Since (bc’)? = (be) (b'c’) and 0 < be, it follows that 
0 <d'c'. 

Our previous use of P having been abandoned in favour of N, we will now denote 
by P the set of all positive rational numbers and by — P the set of all rational numbers 


ax =a,a_ 
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—a, where a € P. From the corresponding result for Z, it follows that (P1) continues 
to hold in Q. We will show that (P2) and (P3) also hold. 

To see that the sum of two positive rational numbers is again positive, we observe 
that if a, b, c, d are integers such that 0 < ab and 0 < cd, then also 


0 < (ab)d? + (cd)b? = (ad + cb) (bd). 


To see that the product of two positive rational numbers is again positive, we observe 
that if a, b, c, d are integers such that 0 < ab and 0 < cd, then also 


0 < (ab)(cd) = (ac) (bd). 


Since (P1)-(P3) all hold, it follows as before that Propositions 12 and 13 also hold 
in Q. Hence 1 € P and (O4)' now implies that a~! € P ifa € P. If a,b € P and 
a <b,thenb~! <a™!, since bb“! = 1 = aa! or 

The set P of positive elements now induces an order relation on Q. We write a < b 
if b—a € P, so thata ¢€ P if and only if 0 < a. Then the order relations (O1)-(O3) 
and (O4)’ continue to hold in Q. 

Unlike the situation for Z, however, the ordering of Q is dense, i.e. if a, b € Q and 
a < b, then there exists c € Q such that a < c < b. For example, we can take c to be 
the solution of (1 + lhc =a+t+b. 

Let Z’ denote the set of all rational numbers a’ which can be represented by (a, 1) 
for some a € Z. For every c € Q, there exist a’, b’! € Z’ with b’ # 0 such that 
c = a'b’—'. In fact, if ¢ is represented by (a, b), we can take a’ to be represented by 
(a, 1) and b’ by (b, 1). Instead of c = a’b’!, we also write c = a’/b’. 

For any a € Z, let a’ be the rational number represented by (a, 1). The mapa > a’ 
of Z into Z’ is clearly bijective. Moreover, it preserves sums and products: 


(a+b) =a' +b’, (ab) =a'd’. 
Furthermore, 
a’ <b’ ifandonlyifa <b. 


Thus the map a —> a’ establishes an ‘isomorphism’ of Z with Z’, and Z’ is a copy 
of Z situated within Q. By identifying a with a’, we may regard Z itself as a subset of 
Q. Then any rational number is the ratio of two integers. 

By way of illustration, we show that if a and b are positive rational numbers, then 
there exists a positive integer / such that /a > b. For if a = m/n and b = p/q, where 
m,n, p,q are positive integers, then 


(np+l)a> pm> p> b. 
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It was discovered by the ancient Greeks that even rational numbers do not suffice for 
the measurement of lengths. If x is the length of the hypotenuse of a right-angled tri- 
angle whose other two sides have unit length then, by Pythagoras’ theorem, x? = 2. 
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But it was proved, probably by a disciple of Pythagoras, that there is no rational 
number x such that x? = 2. (A more general result is proved in Book X, Propo- 
sition 9 of Euclid’s Elements.) We give here a somewhat different proof from the 
classical one. 

Assume that such a rational number x exists. Since x may be replaced by —x, we 
may suppose that x = m/n, where m,n € N. Then m? = 2n?. Among all pairs m,n 
of positive integers with this property, there exists one for which 7 is least. If we put 


p=2n-m, q=m-n, 
then p and q are positive integers, since clearly n <m < 2n. But 
Pr = 4n* — 4mn + m? = 2(m? — 2mn + n’) = 20°, 


Since g <n, this contradicts the minimality of n. 

If we think of the rational numbers as measuring distances of points on a line from 
a given origin O on the line (with distances on one side of O positive and distances on 
the other side negative), this means that, even though a dense set of points is obtained 
in this way, not all points of the line are accounted for. In order to fill in the gaps the 
concept of number will now be extended from ‘rational number’ to ‘real number’. 

It is possible to define real numbers as infinite decimal expansions, the rational 
numbers being those whose decimal expansions are eventually periodic. However, the 
choice of base 10 is arbitrary and carrying through this approach is awkward. 

There are two other commonly used approaches, one based on order and the other 
on distance. The first was proposed by Dedekind (1872), the second by Méray (1869) 
and Cantor (1872). We will follow Dedekind’s approach, since it is conceptually sim- 
pler. However, the second method is also important and in a sense more general. In 
Chapter VI we will use it to extend the rational numbers to the p-adic numbers. 

It is convenient to carry out Dedekind’s construction in two stages. We will first 
define ‘cuts’ (which are just the positive real numbers), and then pass from cuts to 
arbitrary real numbers in the same way that we passed from the natural numbers to the 
integers. 

Intuitively, a cut is the set of all rational numbers which represent points of the line 
between the origin O and some other point. More formally, we define a cut to be a 
nonempty proper subset A of the set P of all positive rational numbers such that 


G) ifaeA,be Pandb <a,thenbe A; 
(ii) if a € A, then there exists a' € A such thata < a’. 


For example, the set J of all positive rational numbers a < 1 is a cut. Similarly, the 
set T of all positive rational numbers a such that a* < 2 is a cut. We will denote the 
set of all cuts by Y. 

For any A, B € Awewrite A < B if Aisa proper subset of B. We will show that 
this induces a total order on FY. 

It is clear that if A < B and B < C, then A < C. It remains to show that, for any 
A, B € F, one and only one of the following alternatives holds: 


A<B, A=B, B<A. 
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It is obvious from the definition by set inclusion that at most one holds. Now suppose 
that neither A < B nor A = B. Then there exists a € A\B. It follows from (i), applied 
to B, that every b € B satisfies b < a and then from (i), applied to A, that b € A. Thus 
B<A. 

Let .Y be any nonempty collection of cuts. A cut B is said to be an upper bound 
for Y if A < B for every A € .Y, and a lower bound for .Y if B < A for every 
A € .%. An upper bound for -7 is said to be a least upper bound or supremum for Y 
if it is a lower bound for the collection of all upper bounds. Similarly, a lower bound 
for .Y is said to be a greatest lower bound or infimum for -Y if it is an upper bound for 
the collection of all lower bounds. Clearly, .” has at most one supremum and at most 
one infimum. 

The set Y has the following basic property: 


(P4) ifanonempty subset has an upper bound, then it has a least upper bound. 


Proof Let C be the union of all sets A € .7. By hypothesis there exists a cut B such 
that A C B for every A € .Y. Since C C B for any such B, and A C C for every 
A € .Y, we need only show that C is a cut. 

Evidently C is anonempty proper subset of P, since B 4 P. Suppose c € C. Then 
c € Aforsome A € .Y.Ifd € Pandd <c,thend € A, since A is acut. Furthermore 
c <a’ for some a’ € A. Since A C C, this proves that C is a cut. 


In the set P of positive rational numbers, the subset T of all x € P such that 
x* < 2 has an upper bound, but no least upper bound. Thus (P4) shows that there is a 
difference between the total order on P and that on Y. 

We now define addition of cuts. For any A, B € #, let A+ B denote the set of all 
rational numbers a + b, witha € A andb ¢€ B. We will show that also A+ Be ZF. 
Evidently A + B is a nonempty subset of P. It is also a proper subset. For choose 
c € P\A andd e€ P\B. Then, by (i), a < c foralla € Aandb <dforallbe B. 
Sincea+b <c+d foralla € Aandb eé B, it follows thatc+d€A+B. 

Suppose now that a € A, b € B and thatc ¢€ P satisfies c <a+b.Ifc > b, then 
c=b+d forsomed € P,andd <a. Hence, by (i),d € Aandc =d+beA+B. 
Similarly, c ¢€ A+ Bifc > a. Finally, if c < aandc < b, choose e é€ P so that 
e <c.Thene € Aandc =e+ f forsome f € P. Then f € B, since f < c, and 
c=e+feA+B. 

Thus A + B has the property (i). It is trivial that A + B also has the property (ii), 
since ifa € Aandb ¢€ B, there exists a’ € A such thata < a’ andthena+b < a’'+b. 
This completes the proof that A + B is a cut. 

It follows at once from the corresponding properties of rational numbers that addi- 
tion of cuts satisfies the commutative law (A2) and the associative law (A3). 

We consider next the connection between addition and order. 


Lemma 15 For any cut A and any c € P, there exists a € A such thata+c ¢ A. 


Proof Ifc ¢ A, thena+c ¢ A for every a € A, since c < a+c. Thus we may 
suppose c € A. Choose b € P\A. For some positive integer n we have b < nc and 
hence nc ¢ A. If n is the least positive integer such that nc ¢ A, thenn > | and 
(n — 1)c € A. Consequently we can take a = (n — I)c. 
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Proposition 16 For any cuts A, B, there exists a cut C such that A+ C = B if and 
only if A < B. 


Proof We prove the necessity of the condition by showing that A < A+ C for any 
cuts A,C. Ifa € Aandc € C, thena <a-+c. Since A+ C is acut, it follows that 
aé€A+C. Consequently A < A+C, and Lemma 15 implies that A #4 A+C. 

Suppose now that A and B are cuts such that A < B, and let C be the set of all 
c € P such thatc +d e€ B for somed € P\A. We are going to show that C is a cut 
and that A+ C= B. 

The set C is not empty. For choose b € B\A and then b’ € B with b < b’. Then 
b' = b+’ for some c’ € P, which implies c’ € C. On the other hand, C < B, since 
c+deéBandd e€ P implyc ¢€ B. Thus C is a proper subset of P. 

Suppose c € C, p € P and p < c. We havec+d e€ B forsomed e€ P\A and 
c = p+eforsomee € P. Sinceed+e e€ P\A and p+ (d+e)=c+de B,it 
follows that p € C. 

Suppose now that c € C, so thatc + d € B for some d € P\A. Choose b € B so 
thatc +d <b. Thenb=c+d-+e forsomee € P.If we putc’ =c+e,thenc < c’. 
Moreover c’ € C, since c’ + d = b. This completes the proof that C is a cut. 

Suppose a € A andc € C. Thenc +d e€ B forsomed e€ P\A. Hencea < d. It 
follows thata +c <c+d,andsoa+ceB.ThusA+C < B. 

It remains to show that B < A+C. Pick anyb € B.Ifb € A, thenalsob € A+C, 
since A < A+ C. Thus we now assume b ¢ A. Choose b’ € B with b < D’. 
Then b’ = b +d for some d € P. By Lemma 15, there exists a € A such that 
a+d ¢ A. Moreovera < b, since b ¢ A, and hence b = a+ c for some c é€ P. Since 
c+(a+d)=b+d=bD’, it follows thatc e C. Thusbe A+CandB<A+C. 


We can now show that addition of cuts satisfies the order relation (O3). Suppose 
first that A < B. Then, by Proposition 16, there exists a cut D such that A+ D = B. 
Hence, for any cut C, 


A+C <(A4+C)+D=B+4+C. 


Suppose next that A+C < B+C.Then A ¥ B. Since B < A wouldimply B+C < 
A-+C, by what we have just proved, it follows from the law of trichotomy that A < B. 

From (O3) and the law of trichotomy, it follows that addition of cuts satisfies the 
cancellation law (A1). 

We next define multiplication of cuts. For any A, B € #, let AB denote the set 
of all rational numbers ab, with a € A and b € B. In the same way as for A + B, it 
may be shown that AB € #. We note only that if a € A,b € B andc < ab, then 
b-!c <a.Hence b~!c € Aandc = (b7!c)b € AB. 

It follows from the corresponding properties of rational numbers that multiplication 
of cuts satisfies the commutative law (M2) and the associative law (M3). Moreover 
(M4) holds, the identity element for multiplication being the cut J consisting of all 
positive rational numbers less than 1. 

We now show that the distributive law (AM1) also holds. The distributive law for 
rational numbers shows at once that 


A(B+C) < AB+AC. 
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It remains to show that ajb + aoc € A(B+ C) ifaj,a. € A,b € Bandc € C. But 
ab+arye <arn(b+c) ifay <a, 
and 
ab+ac<a(b+c) ifaa<aq. 


In either event it follows that ajb + agc € A(B+C). 

We can now show that multiplication of cuts satisfies the order relation (O4). If 
A < B, then there exists a cut D such that A+ D = B andhence AC < AC+ DC = 
BC. Conversely, suppose AC < BC. Then A ¥ B. Since B < A would imply 
BC < AC, it follows that A < B. 

From (O4) and the law of trichotomy (O2) it follows that multiplication of cuts 
satisfies the cancellation law (M1). 

We next prove the existence of multiplicative inverses. The proof will use the fol- 
lowing multiplicative analogue of Lemma 15: 


Lemma 17 For any cut A and any c € P with c > 1, there exists a € A such that 
ac€é A. 


Proof Choose any b € A. We may suppose be € A, since otherwise we can take 
a = b. Since b < bc, we have bc = b+ d for some d € P. By Lemma 15 we can 
choose a € A so thata+d ¢ A. Sinceb+d € A, it follows thatb+d <a-+d, and 
sob <a.Hence ab~! > 1 and 


a+d <a+(ab"')d =ab"'!(b +d) =ac. 


Sincea +d ¢ A, it follows that ac ¢ A. 


Proposition 18 For any A € Y, there exists A~' € Y such that AA~! = I. 


Proof Let A~! be the set of all b € P such that b < c7! for some c € P\A. It is 
easily verified that A~! is a cut. We note only that a~! ¢ A~! if a € A and that, if 
b <c7!, then also b < d7! for somed > c. 

We now show that AA~! = J. Ifa € A andb € A7! thenab < 1, sincea > b7! 
would imply a > c for some c € P\A. Thus AA~! < J. On the other hand, if 
0 <d < 1 then, by Lemma 17, there exists a € A such that ad—! ¢ A. Choose a’ € A 
so that a < a’, and put b = (a’)~!d. Then b < a7'd. Since a~!d = (ad7!)™!, it 
follows that b ¢ A~! and consequently d = a’b € AA~!. Thus I < AA™!. 


For any positive rational number a, the set A, consisting of all positive rational 
numbers c such that c < ais acut. The mapa —> A, of P into F is injective and 
preserves sums and products: 


Aatb = Aqgt Ap, Aap = AgAp. 


Moreover, Ag < Ap if and only if a < b. 
By identifying a with A, we may regard P as a subset of #. It is a proper subset, 
since (P4) does not hold in P. 


22 I The Expanding Universe of Numbers 


This completes the first stage of Dedekind’s construction. In the second stage we 
pass from cuts to real numbers. Intuitively, a real number is the difference of two cuts. 
We will deal with the second stage rather briefly since, as has been said, it is completely 
analogous to the passage from the natural numbers to the integers. 

On the set Y x F of all ordered pairs of cuts an equivalence relation is defined by 


(A, B) ~~ (A’, B’) ifA+B’=A'+B. 


We define a real number to be an equivalence class of ordered pairs of cuts and, as is 
now customary, we denote the set of all real numbers by R. 
Addition and multiplication are unambiguously defined by 


(A, B) + (C, D) = (A+C,B+D), 
(A, B)-(C, D) = (AC + BD, AD+ BC). 


They obey the laws (A2)-(A5), (M2)—(MS5) and (AM1)-(AM2). 

A real number represented by (A, B) is said to be positive if B < A. If we denote 
by #’ the set of all positive real numbers, then (P1)—(P3) continue to hold with YW” in 
place of P. An order relation, satisfying (01)—(O3), is induced on R by writing a < b 
if b-—a € Y'. Moreover, any a € R may be written in the form a = b — c, where 
b,c e€ #'. Itis easily seen that Y is isomorphic with W’. By identifying Y with Y’, 
we may regard both Y and Q as subsets of R. An element of R\Q is said to be an 
irrational real number. 

Upper and lower bounds, and suprema and infima, may be defined for subsets of 
R in the same way as for subsets of Y. Moreover, the least upper bound property (P4) 
continues to hold in R. By applying (P4) to the subset —-Y = {—a: a € .7} we see 
that if a nonempty subset .Y of R has a lower bound, then it has a greatest lower bound. 

The least upper bound property implies the so-called Archimedean property: 


Proposition 19 For any positive real numbers a, b, there exists a positive integer n 
such that na > b. 


Proof Assume, on the contrary, that na < b for every n € N. Then b is an 
upper bound for the set {na: n € N}. Let c be a least upper bound for this set. From 
na < c for every n € N we obtain (n + l)a < c for every n € N. But this implies 
na < c—a foreveryn € N. Since c — a < c andc is a least upper bound, we have a 
contradiction. 


Proposition 20 For any real numbers a, b with a < b, there exists a rational number 
c such thata <c <b. 


Proof Suppose first that a > 0. By Proposition 19 there exists a positive integer n 
such that n(b — a) > 1. Then b > a+n7!. There exists also a positive integer m such 
that mn—! > a. If m is the least such positive integer, then (m — 1)n~! < a and hence 
mn! <a+n7! <b. Thus we can take c = mn7!. 

Ifa < Oandb > 0 wecan take c = 0. Ifa < Oandb < 0, then —b < d < —a for 


some rational d and we can take c = —d. 


Proposition 21 For any positive real number a, there exists a unique positive real 
number b such that b? = a. 
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Proof Let S be the set of all positive real numbers x such that x? < a. The set S is 
not empty, since it contains a ifa < 1 and1ifa > 1. If y > Oand y? > a, then 
y is an upper bound for S. In particular, 1 + a is an upper bound for S. Let b be the 
least upper bound for S. Then b? = a, since b* < a would imply (b + 1/n)? < a for 
sufficiently large n > 0 and b? > a would imply (b — 1/n)? > a for sufficiently large 
n > O. Finally, if c? = a andc > 0, thenc = b, since 


(c—b)(c +b) =c? —b* =0. 


The unique positive real number b in the statement of Proposition 21 is said to be a 
square root of a and is denoted by ./a or a!/?. In the same way it may be shown that, 
for any positive real number a and any positive integer n, there exists a unique positive 
real number b such that b” = a, where b” = b---b (n times). We say that b is an n-th 
root of a and write b = 2/a or a!/". 

A set is said to be a field if two binary operations, addition and multiplication, are 
defined on it with the properties (A2)-(A5), (M2)—(MS5) and (AM1)-(AM2). A field 
is said to be ordered if it contains a subset P of ‘positive’ elements with the properties 
(P1)-(P3). An ordered field is said to be complete if, with the order induced by P, it 
has the property (P4). 

Propositions 19—21 hold in any complete ordered field, since only the above prop- 
erties were used in their proofs. By construction, the set R of all real numbers is a 
complete ordered field. In fact, any complete ordered field F is isomorphic to R, i.e. 
there exists a bijective map y: F — R such that, for all a,b € F, 


pl(at+b)=(a)+ ed), 
p(ab) = g(a)g(d), 


and g(a) > O if and only if a € P. We sketch the proof. 

Let e be the identity element for multiplication in F and, for any positive integer 
n, let-ne =e+---+e(n summands). Since F is ordered, ne is positive and so has a 
multiplicative inverse. For any rational number m/n, where m,n € Z andn > 0, write 
(m/n)e = m(ne)~! if m > 0,= —(—m)(ne)~! if m < 0, and = Oif m = 0. The 
elements (m/n)e form a subfield of F isomorphic to Q and we define g((m/n)e) = 
m/n. For any a € F, we define g(a) to be the least upper bound of all rational numbers 
m/n such that (m/n)e < a. One verifies first that the map g: F —> R is bijective and 
that g(a) < g(b) if and only if a < b. One then deduces that g preserves sums and 
products. 

Actually, any bijective map g: F — R which preserves sums and products is also 
order-preserving. For, by Proposition 21, b > a if and only if b — a = c* for some 
c 4 0, and then 


9(b) — 9(a) = 9(b — a) = 9(c*) = g(c)* > 0. 


Those whose primary interest lies in real analysis may define R to be a complete 
ordered field and omit the tour through N, Z, Q and #. That is, one takes as axioms 
the 14 properties above which define a complete ordered field and simply assumes that 
they are consistent. 
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The notion of convergence can be defined in any totally ordered set. A sequence 
{an} is said to converge, with limit 1, if for any 1’, 1’ such that l’ <1 < 1”, there exists 
a positive integer N = N(I', 1") such that 


I! <a, <I" foreveryn > N. 
The limit / of the convergent sequence {a,} is clearly uniquely determined; we write 


lim a, = J, 
n—- oo 
Ord, > lasn> oO. 

It is easily seen that any convergent sequence is bounded, i.e. it has an upper bound 
and a lower bound. A trivial example of a convergent sequence is the constant sequence 
{dn}, where a, = a for every n; its limit is again a. 

In the set R of real numbers, or in any totally ordered set in which each bounded 
sequence has a least upper bound and a greatest lower bound, the definition of conver- 
gence can be reformulated. For, let {a,} be a bounded sequence. Then, for any positive 
integer m, the subsequence {dy}n>m has a greatest lower bound b,, and a least upper 
bound cp: 


bm = inf dn, Cm = SUP Gy. 
n>m n>m 


The sequences {bm }m>1 and {Cm}m>1 are also bounded and, for any positive integer m, 


bm < bm+1 < Cm+1 S< Cm. 


If we define the lower limit and upper limit of the sequence {a,} by 


lim dy, := sup bm, lim dy := inf cm, 
n> oo m>1 n=200 m>1 
then lim, _, .4dn < limps o0dn, and it is readily shown that limy—5 9 dy) = / if and only 
if 


lim a, =/= lim ay. 
noo n—- oo 
A sequence {a,,} is said to be nondecreasing if dn < dn+1 for every n and nonin- 
creasing if An41 < Gn for every n. Itis said to be monotonic if it is either nondecreasing 
or nonincreasing. 


Proposition 22 Any bounded monotonic sequence of real numbers is convergent. 
Proof Let {a,} be a bounded monotonic sequence and suppose, for definiteness, that 


it is nondecreasing: a; < dz < a3 < ---.In this case, in the notation used above we 
have by, = am and cm = c; for every m. Hence 


lim dy) = sup dm = Cc) = lim dp. 
noo m>1 noo 
Proposition 22 may be applied to the centuries-old algorithm for calculating square 
roots, which is commonly used today in pocket calculators. Take any real number 
a> 1 and put 


xy =(1+a)/2. 
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Then x; > 1 and i > a, since (a — 1)* > 0. Define the sequence {x,} recursively by 
Xn+1 = (Xn + a/Xn)/2 (n > 1). 


It is easily verified that if x, > 1 and x > a, then X74 > Las > aandxn41 < Xn. 
Since the inequalities hold for n = 1, it follows that they hold for all n. Thus the 
sequence {x,} is nonincreasing and bounded, and therefore convergent. If x, — b, 
then a/x, 3 a/b and xn41 > b. Hence b = (b+a/b)/2, which simplifies to b? = a. 

We consider now sequences of real numbers which are not necessarily monotonic. 


Lemma 23 Any sequence {day} of real numbers has a monotonic subsequence. 


Proof Let M be the set of all positive integers m such that a, > dy for every n > m. 
If M contains infinitely many positive integers mj < m2 < ---, then {am,} is a 
nonincreasing subsequence of {a,,}. If M is empty or finite, there is a positive integer 
n, such that no positive integer n > ny; isin M. Then dy, > dp, for some nz > ny, 
Gn, > An, for some n3 > nz, and so on. Thus {ay,} is a nondecreasing subsequence of 


{an}. 


It is clear from the proof that Lemma 23 also holds for sequences of elements of 
any totally ordered set. In the case of IR, however, it follows at once from Lemma 23 
and Proposition 22 that 


Proposition 24 Any bounded sequence of real numbers has a convergent subse- 
quence. 


Proposition 24 is often called the Bolzano—Weierstrass theorem. It was stated by 
Bolzano (c. 1830) in work which remained unpublished until a century later. It became 
generally known through the lectures of Weierstrass (c. 1874). 

A sequence {a,} of real numbers is said to be a fundamental sequence, or “Cauchy 
sequence’, if for each ¢ > 0 there exists a positive integer N = N(e) such that 


—€ <d)p—dq, <e forallp,g=QN. 
Any fundamental sequence {a,} is bounded, since any finite set is bounded and 
anN—&<Q)yp<ayt+e forp=N. 


Also, any convergent sequence is a fundamental sequence. For suppose a, — | as 
n — oo. Then, for any ¢ > 0, there exists a positive integer N such that 


l—eée/2 <a, <l+e/2 foreveryn > N. 
It follows that 
—€ <dyp—dg<e forp>qeN. 


The definitions of convergent sequence and fundamental sequence, and the preced- 
ing result that ‘convergent’ implies ‘fundamental’, hold also for sequences of rational 
numbers, and even for sequences with elements from any ordered field. However, for 
sequences of real numbers there is a converse result: 
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Proposition 25 Any fundamental sequence of real numbers is convergent. 


Proof If {ay} is a fundamental sequence of real numbers, then {a,} is bounded and, 
for any ¢ > 0, there exists a positive integer m = m(e) such that 


—€/2 <adp—dq <eé/2 forall p,q =m. 


But, by Proposition 24, the sequence {a,} has a convergent subsequence {dy,}. If 1 is 
the limit of this subsequence, then there exists a positive integer N > m such that 


l—e/2<an, <l+e/2 forny > N. 
It follows that 


l-—e<a,<l+e forn>N. 


Thus the sequence {a,,} converges with limit /. 


Proposition 25 was known to Bolzano (1817) and was clearly stated in the influ- 
ential Cours d’analyse of Cauchy (1821). However, a rigorous proof was impossible 
until the real numbers themselves had been precisely defined. 

The Méray—Cantor method of constructing the real numbers from the 
rationals is based on Proposition 25. We define two fundamental sequences {a,} and 
{a/,} of rational numbers to be equivalent if a, — a!, > 0 as n > ov. This is indeed 
an equivalence relation, and we define a real number to be an equivalence class of 
fundamental sequences. The set of all real numbers acquires the structure of a field if 
addition and multiplication are defined by 


{an} + {bn} = {an + bn}, {an} - {bn} = {anbn}. 


It acquires the structure of a complete ordered field if the fundamental sequence {a,,} is 
said to be positive when it has a positive lower bound. The field Q of rational numbers 
may be regarded as a subfield of the field thus constructed by identifying the ratio- 
nal number a with the equivalence class containing the constant sequence {a,}, where 
ay, = a for every n. 

It is not difficult to show that an ordered field is complete if every bounded 
monotonic sequence is convergent, or if every bounded sequence has a convergent 
subsequence. In this sense, Propositions 22 and 24 state equivalent forms for the least 
upper bound property. This is not true, however, for Proposition 25. An ordered field 
need not have the least upper bound property, even though every fundamental sequence 
is convergent. It is true, however, that an ordered field has the least upper bound 
property if and only if it has the Archimedean property (Proposition 19) and every 
fundamental sequence is convergent. 

In a course of real analysis one would now define continuity and prove those 
properties of continuous functions which, in the 18th century, were assumed as 
‘geometrically obvious’. For example, for given a, b € R witha < b, let ] = [a, b] be 
the interval consisting of all x € R such that a < x < b. If f: I — Ris continuous, 
then it attains its supremum, i.e. there exists c € J such that f(x) < f(c) for every 
x € I. Also, if f(a) f(b) < 0, then f(d) = 0 for some d ¢€ J (the intermediate- 
value theorem). Real analysis is not our primary concern, however, and we do not feel 
obliged to establish even those properties which we may later use. 
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4 Metric Spaces 


The notion of convergence is meaningful not only for points on a line, but also for 
points in space, where there is no natural relation of order. We now reformulate our 
previous definition, so as to make it more generally applicable. 

The absolute value |a| of a real number a is defined by 


jaj=a ifa>0, 


jajJ=-a ifa <0. 
It is easily seen that absolute values have the following properties: 


\0| = 0, lal>0 ifa+40; 
lal =|—al; 


la+b| < |a| +d. 


The first two properties follow at once from the definition. To prove the third, we ob- 

serve first thata + b < |a| + |b], since a < |a| and b < |b|. Replacing a by —a and b 

by —b, we obtain also —(a + b) < |a| + |b]. But |a + D| is either a + b or —(a +b). 
The distance between two real numbers a and b is defined to be the real number 


d(a, b) = |a— DI. 


From the preceding properties of absolute values we obtain their counterparts for dis- 
tances: 

(D1) d(a,a) =0,d(a,b) > OifaFb; 

(D2) d(a, b) = d(b, a); 

(D3) d(a, b) < d(a,c) + d(c, bd). 

The third property is known as the triangle inequality, since it may be interpreted as 
saying that, in any triangle, the length of one side does not exceed the sum of the 
lengths of the other two. 

Fréchet (1906) recognized these three properties as the essential characteristics of 
any measure of distance and introduced the following general concept. A set E is a 
metric space if with each ordered pair (a, b) of elements of E there is associated a real 
number d(a, b), so that the properties (D1)—(D3) hold for all a, b,c € E. 

We note first some simple consequences of these properties. For all a, b, a’, b’ € E 
we have 


|d(a, b) — d(a’, b’)| < da, a’) + d(, b’) (*) 
since, by (D2) and (D3), 


d(a, b) < d(a, a’) + d(a’, b’) + db, b’), 
d(a’, b’) < d(a, a’) + d(a, b) + d(b, b’). 


Taking b = b’ in («), we obtain from (D1), 


|d(a, b) — d(a’, b)| < da, a’). (*#) 
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In any metric space there is a natural topology. A subset G of a metric space E 
is open if for each x € G there is a positive real number 6 = 0(x) such that G also 
contains the whole open ball £5(x) = {y € E: d(x, y) < 6}. Aset F C E is closed if 
its complement E\ F is open. 

For any set A C E, its closure A is the intersection of all closed sets containing it, 
and its interior int A is the union of all open sets contained in it. 

A subset F of EF is connected if it is not contained in the union of two open subsets 
of E whose intersections with F are disjoint and nonempty. A subset F of E is (se- 
quentially) compact if every sequence of elements of F has a subsequence converging 
to an element of F (and locally compact if this holds for every bounded sequence of 
elements of F). 

A map f: X — Y from one metric space X to another metric space Y is contin- 
uous if, for each open subset G of Y, the set of all x € X such that f(x) € G is an 
open subset of X. The two properties stated at the end of $3 admit far-reaching gen- 
eralizations for continuous maps between subsets of metric spaces, namely that under 
a continuous map the image of a compact set is again compact, and the image of a 
connected set is again connected. 

There are many examples of metric spaces: 


(i) Let E = R” be the set of all n-tuples a = (a1, ..., @,) of real numbers and define 
d(b,c) = |b—cl, 

where b—c = ($61 — y1,.--5 Bn — yn) if b = (A, ...-, Bn) andc = (y1,..-, Yn), and 
lal = ie leah 


Alternatively, one can replace the norm |a| by either 


n 


lai = > Jail 


= 


or 


n 1/2 
lal2 = (> ji?) 


j=l 


In the latter case, d(b, c) is the Euclidean distance between b and c. The triangle in- 
equality in this case follows from the Cauchy—Schwarz inequality: for any real numbers 


Bi,yjG =1,...,0) 
n 2 n n 
(> Biri) 2 (> 63) (> 73). 
j=l j=l j=l 


(ii) Let E = FF be the set of all n-tuples a = (a1,...,@), where aj = O or 1 
for each j, and define the Hamming distance d(b, c) between b = (fj,..., By) and 
c = (J1,---, Yn) to be the number of j such that £; ¢ y;. This metric space plays a 
basic role in the theory of error-correcting codes. 
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(iii) Let E = @ (J) be the set of all continuous functions f : J > R, where 
IT=[a,b])={x €R:a<x <b} 
is an interval of R, and define d(g, h) = |g — h|, where 


If|= sup |f(@)I. 


a<x<b 


(A well-known property of continuous functions ensures that f is bounded on /.) 
Alternatively, one can replace the norm | f| by either 


b 
fl =i If G)ldx 


b 1/2 
b=(f \s)Pax) 


(iv) Let E = @(R) be the set of all continuous functions f: R — R and define 


or 


d(g,h) = >) dv(g,h)/2% 11 + dy(g, A), 
N>1 


where dy(g,h) = supy,)<y |g(x) — A(x)|. The triangle inequality (D3) follows from 
the inequality 


la + A\/L + la + BI) < lal/O + lal) + 141/00 + IAI 


for arbitrary real numbers a, /. 

The metric here has the property that d(f;, f) — 0 if and only if f(x) > f(x) 
uniformly on every bounded subinterval of R. It may be noted that, even though E is 
a vector space, the metric is not derived from a norm since, if 2 € IR, one may have 
d(ig, ah) # |Ald(g, h). 


(v) Let E be the set of all measurable functions f: J > R, where J = [a, b] is an 
interval of IR, and define 


b 
d(g, h) = | Ig@v) — AG) + Ig) — A@)I) "dx. 


In order to obtain (D1), we identify functions which take the same value at all points 
of 7, except for a set of measure zero. 

Convergence with respect to this metric coincides with convergence in measure, 
which plays a role in the theory of probability. 


(vi) Let E = FX be the set of all infinite sequences a = (a1, a2, ...), where aj = O or 
1 for every j, and define d(a, a) = 0, d(a, b) = ait + b, where b = (f}, f2,...) 
and k is the least positive integer such that a, 4 fx. 
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Here the triangle inequality holds in the stronger form 
d(a, b) < max[d(a, c), d(c, b)]. 


This metric space plays a basic role in the theory of dynamical systems. 


(vii) A connected graph can be given the structure of a metric space by defining the dis- 
tance between two vertices to be the number of edges on the shortest path joining them. 


Let E be an arbitrary metric space and {a,} a sequence of elements of E. The 
sequence {a,} is said to converge, with limit a € E, if 


d(a,,a) > 0 asn>o, 


i.e. if for each real ¢ > 0 there is a corresponding positive integer N = N(e) such that 
d(ay,a) < ¢ foreveryn > N. 
The limit a is uniquely determined, since if also d(ay, a’) > 0, then 


d(a,a’) < d(an, a) + d(@n,a'), 
and the right side can be made arbitrarily small by taking n sufficiently large. We write 
lim a, =a, 
n—- oo 


Or dy, > aasn —> oo. If the sequence {a,,} has limit a, then so also does any (infinite) 
subsequence. 

If a, a and b, — b, then d(ay, bn)  d(a, b), as one sees by taking a’ = ay 
and b’ = by, in (*). 

The sequence {a} is said to be a fundamental sequence, or ‘Cauchy sequence’, 
if for each real ¢ > O there is a corresponding positive integer N = N(e) such that 
d(am,4n) < € forallm,n > N. 

If {a,} and {b,} are fundamental sequences then, by (*), the sequence {d(ay, b,)} 
of real numbers is a fundamental sequence, and therefore convergent. 

A set S C E is said to be bounded if the set of all real numbers d(a, b) with 
a,b € S isa bounded subset of R. 

Any fundamental sequence {a,} is bounded, since if 


d(am,dn) <1 forallm,n>N, 
then 
d(am,dn) <1+6 forallm,n éeN, 


where 6 = max) <j <k<n d(d;, ax). 
Furthermore, any convergent sequence {a,,} is a fundamental sequence, as one sees 
by taking a = limy-soo Gp in the inequality 


d(am, an) < dam, a) + d(an, a). 


A metric space is said to be complete if, conversely, every fundamental sequence 
is convergent. 


4 Metric Spaces 31 


By generalizing the Méray—Cantor method of extending the rational numbers to 
the real numbers, Hausdorff (1913) showed that any metric space can be embedded in 
a complete metric space. To state his result precisely, we introduce some definitions. 

A subset F of a metric space E is said to be dense in E if, foreach a € E and each 
real ¢ > O, there exists some b € F such that d(a, b) < «. 

A map o from one metric space E to another metric space E’ is necessarily injec- 
tive if it is distance-preserving, i.e. if 


d'(a(a),o(b)) =d(a,b) foralla,be E. 


If the map o is also surjective, then it is said to be an isometry and the metric spaces 
E and E” are said to be isometric. 

A metric space E is said to be a completion of a metric space E if E is complete 
and E is isometric to a dense subset of E. It is easily seen that any two completions of 
a given metric space are isometric. 

Hausdorff’s result says that any metric space E has a completion E. We sketch the 
proof. Define two fundamental sequences {a,,} and {a;,} in E to be equivalent if 


lim d(an, aj.) = 0. 
n—- oo 


It is easily shown that this is indeed an equivalence relation. Moreover, if the funda- 
mental sequences {ap}, {bn} are equivalent to the fundamental sequences {a/,}, {bj} 
respectively, then 


lim d(an, bn) = lim d(a’,, b’). 
n->oo n> oo 


We can give the set E of all equivalence classes of fundamental sequences the 
structure of a metric space by defining 


d({an}, (bn}) = lim dan, bn). 


For each a ¢€ E, let a be the equivalence class in E which contains the fundamental 
sequence {a,} such that a, = a for every n. Since 


d(a,b) =d(a,b) foralla,be E, 


E is isometric to the set E’ = {a: a € E}. It is not difficult to show that E’ is dense in 
E and that E is complete. 

Which of the previous examples of metric spaces are complete? In example (i), the 
completeness of R” with respect to the first definition of distance follows directly from 
the completeness of R. It is also complete with respect to the two alternative definitions 
of distance, since a sequence which converges with respect to one of the three metrics 
also converges with respect to the other two. Indeed it is easily shown that, for every 
aeR’, 


la] < Jal2 < lah 


and 


1/2 1/2 
lali <n! lal, lala <n" Ial. 
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In example (ii), the completeness of I> is trivial, since any fundamental sequence 
is ultimately constant. 

In example (iii), the completeness of @(/) with respect to the first definition of 
distance follows from the completeness of R and the fact that the limit of a uniformly 
convergent sequence of continuous functions is again a continuous function. 

However, @(/) is not complete with respect to either of the two alternative defini- 
tions of distance. It is possible also for a sequence to converge with respect to the two 
alternative definitions of distance, but not with respect to the first definition. Similarly, 
a sequence may converge in the first alternative metric, but not even be a fundamental 
sequence in the second. 

The completions of the metric space @ (J) with respect to the two alternative met- 
rics may actually be identified with spaces of functions. The completion for the first 
alternative metric is the set L(/) of all Lebesgue measurable functions f: 1 —~ R 
such that 


b 
i If (x)|dx < 00, 


functions which take the same value at all points of J, except for a set of measure zero, 
being identified. The completion L7(/) for the second alternative metric is obtained 
by replacing 1. | f (x)|dx by t's | f (x)|?dx in this statement. 

It may be shown that the metric spaces of examples (iv)—(vi) are all complete. In 
example (vi), the strong triangle inequality implies that {a,,} is a fundamental sequence 
if (and only if) d(an41, dn) > Oasn > oo. 

Let E be an arbitrary metric space and f: E — E amap of E into itself. A point 
x € E is said to be a fixed point of f if f(x) = x. A useful property of complete metric 
spaces is the following contraction principle, which was first established in the present 
generality by Banach (1922), but was previously known in more concrete situations. 


Proposition 26 Let E be a complete metric space and let f: E > E be amap of E 
into itself. If there exists a real number 0, with O < @ < 1, such that 


d(f (x'), f(x”) < Od’, x”) forall x',x" € E, 
then the map f has a unique fixed point x € E. 


Proof It is clear that there is at most one fixed point, since 0 < d(x’, x”) < @d(x’, x”) 
implies x’ = x”. To prove that a fixed point exists we use the method of successive 
approximations. 

Choose any xo € EF and define the sequence {x,} recursively by 


Xn = fGn-1) (2D. 
For any k > 1 we have 
U(xn41, Xk) = ACF Ox), fx-1)) < Od (xx, xK-1)- 
Applying this k times, we obtain 


d(xeyi, xx) < OXd(x1, x0). 
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Consequently, ifn > m > 0, 


d(Xn, Xm) < An, Xn—1) + d&n-1, Xn-2) + +++ + dQm+1, Xm) 
2 (gn! ay gr-2 2b ee 0” )d(x1, xo) 
< 6™(1 —0)7!d(x1, x0), 


since 0 < @ < 1. It follows that {x,} is a fundamental sequence and so a convergent 
sequence, since EF is complete. If x = limps oo Xn, then 


d(f (x), xX)< d(f (x), Xn41) + A(n+41,X) 
< Od(X, Xn) + d(x, Xn41). 


Since the right side can be made less than any given positive real number by taking n 
large enough, we must have f(x) = x. The proof shows also that, for any m > 0, 


d(x, xm) < 0"(1 — 0)~!d(xq, x0). 


The contraction principle is surprisingly powerful, considering the simplicity of its 
proof. We give two significant applications: an inverse function theorem and an exis- 
tence theorem for ordinary differential equations. In both cases we will use the notion 
of differentiability for functions of several real variables. The unambitious reader may 
simply take n = 1| in the following discussion (so that ‘invertible’ means ‘nonzero’). 
Functions of several variables are important, however, and it is remarkable that the 
proper definition of differentiability in this case was first given by Stolz (1887). 

A map g: U > R"”, where U C R" is a neighbourhood of xo € R" (i.e., U 
contains some open ball {x € R”: |x — xo| < p}), is said to be differentiable at xo if 
there exists a linear map A: R” > R” such that 


lp(x) — p(xo) — A@ — xo)|/lx — x0] > 0 as |x — xo| > 0. 


(The inequalities between the various norms show that it is immaterial which norm is 
used.) The linear map A, which is then uniquely determined, is called the derivative of 
g at xo and will be denoted by 9’ (xo). 

This definition is a natural generalization of the usual definition when m =n = 1, 
since it says that the difference g (xp + 1) — o(xo) admits the linear approximation Ah 
for |h| > 0. 

Evidently, if g; and g2 are differentiable at xo, then so also is g = g1 + g2 and 


' (xo) = 9 (x0) + 95 (x0). 


It also follows directly from the definition that derivatives satisfy the chain rule: If 
g: U — R"", where U is a neighbourhood of x9 € R”, is differentiable at xo, and if 
yw: V7 R!, where V isa neighbourhood of yo = g(xo) € R”, is differentiable at yo, 
then the composite map y = yog: U > R’ is differentiable at xo and 


X' (x0) = w'(y0)9' (x0), 


the right side being the composite linear map. 
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We will also use the notion of norm of a linear map. If A: R” > R” is a linear 
map, its norm |A| is defined by 


|A| = sup |Ax|]. 


|x|<1 
Evidently 
|A + A2| < |Ai| + |Azl. 
Furthermore, if B: R” > R! is another linear map, then 
|BA| < |BIIAI. 


Hence, if m =n and |A| < 1, then the linear map J — A is invertible, its inverse being 
given by the geometric series 


(—A)=I14+A+A7 4... 


It follows that for any invertible linear map A: R” > R”,if B: R” > R” isa lin- 
ear map such that |B — A| < |A7!|7!, then B is also invertible and |B~! — A~'| > 0 
as |B — A| > 0. 

Ifo: U — R" is differentiable at x9 € R”, then it is also continuous at xo, since 


|p (x) — e(xo)| < le) — Po) — 9’ (0) — x0)| + Ie" Co)Ilx — xol- 


We say that g is continuously differentiable in U if it is differentiable at each point of 
U and if the derivative g’(x) is a continuous function of x in U. The inverse function 
theorem says: 


Proposition 27 Let Uo be a neighbourhood of xo € IR” and let g: Up > R" bea 
continuously differentiable map for which o' (xo) is invertible. 

Then, for some 6 > 0, the ball U = {x € R”: |x — xo| < 6} is contained in Up 
and 


(i) the restriction of g to U is injective; 
(ii) V := g(U) is open, i.e. if y € V, then V contains all y € R" near n; 
(ili) the inverse map y: V > U is also continuously differentiable and, if y = g(x), 
then y'(y) is the inverse of 9'(x). 


Proof To simplify notation, assume x9 = @(xo) = 0 and write A = 9/(0). For any 
y € R”, put 


fy) =x +A7'Ly - e(@)]. 


Evidently x is a fixed point of f, if and only if g(x) = y. The map fy is also contin- 
uously differentiable and 


f,@) =1-A'9'(@) =A"[A-9'(@)]. 


Since g'(x) is continuous, we can choose 6 > 0 so that the ball U = {x € R” : |x| < 6} 
is contained in Ug and 


If@)| < 1/2 forx € U. 
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If xj, x2 € U, then 


1 
I fy 2) -— A@DI = / f'(C. = thx, + tx2)(x2 — x1) dt 
< |x2 — x4|/2. 


It follows that f, has at most one fixed point in U. Since this holds for arbitrary y € R”, 
the restriction of g to U is injective. 

Suppose next that 7 = y(€) for some € € U. We wish to show that, if y is near 7, 
the map fy) has a fixed point near ¢. 

Choose r = r(€) > Oso that the closed ball B, = {x € R”: |x — €| < r} is 
contained in U, and fix y € R” so that |y — y| < r/2|A7!|. Then 


lIf© -€1=|A7O- I 
<|A™ ly - a] < 7/2. 
Hence if |x — ¢| < r, then 
Ifva) -— el < Ih) - AHOI+IA©M - 4 
< |x —E|/2+r/2 <0. 
Thus f}(B,) € B;. Also, if x1, x2 € B;, then 
| fy (2) — fy @1)| < [x2 — «11/2. 


But B, is a complete metric space, with the same metric as R”, since it is a closed 
subset (if x, € B, and x, — x in R”, then also x € B,). Consequently, by the con- 
traction principle (Proposition 26), f, has a fixed point x € B,. Then g(x) = y, which 
proves (ii). 

Suppose now that y, 7 € V. Then y = g(x), 7 = g(€) for unique x, € € U. Since 


Ify(@) — AP@I s le — ¢1/2 


and 
fy@) -— fp@) =x-€-A'O-n), 
we have 
|A7'(y — n)| = lx - 1/2. 
Thus 


jx —E| < 2|A“'Ily — a1. 
If F = g'(€) and G = F™!, then 


w(y) — wm) -— Gy -— 4) =x -€-Giy-n) 
= —Glo(x) -9(€)-— F(x -¢)]. 
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Hence 


ly(y) — vn) — Gy — MI/ly — nl < 2IAIGIlg@) — @@ — F@ -Ol/Ix - 1. 


If |y — y| — 0, then |x — ¢| — O and the right side tends to 0. Consequently w is 
differentiable at 7 and y'(7) = G = F7!. 

Thus y is differentiable in U and, a fortiori, continuous. In fact y is continuously 
differentiable, since F is a continuous function of € (by hypothesis), since € = y(n) 
is a continuous function of 7, and since F —! is a continuous function of F. 


To bring out the meaning of Proposition 27 we add some remarks: 


(i) The invertibility of g’(xo) is necessary for the existence of a differentiable inverse 
map, but not for the existence of a continuous inverse map. For example, the contin- 
uously differentiable map g: R — R defined by g(x) = x? is bijective and has the 
continuous inverse y(y) = y!/3, although g’(0) = 0. 


(ii) The hypothesis that g is continuously differentiable cannot be totally dispensed 
with. For example, the map g: R > R defined by 


g(x) =x+x*sin(1/x) ifx £0,9(0) =0, 


is everywhere differentiable and g’(0) # 0, but g is not injective in any neighbourhood 
of 0. 


(iii) The inverse map may not be defined throughout Up. For example, the map 
y: R* > R? defined by 


2. 2 
@i(%1, x2) =xXP — x5,  2(%1, X2) = 2x1 x2, 


is everywhere continuously differentiable and has an invertible derivative at every point 
except the origin. Thus the hypotheses of Proposition 27 are satisfied in any connected 
open set Up C R? which does not contain the origin, and yet g(1, 1) = g(—1, —1). 


It was first shown by Cauchy (c. 1844) that, under quite general conditions, an 
ordinary differential equation has local solutions. The method of successive approxi- 
mations (i.e., the contraction principle) was used for this purpose by Picard (1890): 


Proposition 28 Let tf) € R,é& € R"” and let U be a neighbourhood of (to, &)) in 
R x R”. Ifg: U > R" is a continuous map with a derivative o' with respect to x that 
is continuous in U, then the differential equation 


dx/dt = p(t, x) (1) 
has a unique solution x(t) which satisfies the initial condition 
X(to) = ¢o (2) 


and is defined in some interval |t — to| < 6, where 6 > 0. 
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Proof If x(t) is a solution of the differential equation (1) which satisfies the initial 
condition (2), then by integration we get 


t 


X(to) = 0 +f g[t, x(t) ]dt. 


1 


Conversely, if a continuous function x(t) satisfies this relation then, since g is contin- 
uous, x(f) is actually differentiable and is a solution of (1) that satisfies (2). Hence we 
need only show that the map .¥ defined by 


(Fx) =H + fi gle, x(x) 


has a unique fixed point in the space of continuous functions. 
There exist positive constants M, L such that 


lo@OlSsM, let OSL 


for all (t, €) in a neighbourhood of (to, G9), which we may take to be U. If (t, G1) e U 
and (ft, €2) € U, then 


1 
lp(t, ¢2) — p(t, Ci) = | g(t, 1 — we, +u&)(& — &)du 
< Li] — &|. 


Choose 6 > 0 so that the box |f — fo| < 6, |€ — &| < Mo is contained in U and 
also Ld < 1. Take J = [to — 6, to + 0] and let @(/) be the complete metric space of 
all continuous functions x: J > R” with the distance function 


d(x, x2) = sup [x1 () — x2(t)I. 


The constant function xo(t) = & is certainly in @(/). Let E be the subset of all 
x € G(Z) such that x(to) = & and d(x, x9) < Mo. Evidently if x, € E and x, > x 


in @(Z), then x € E. Hence E is also a complete metric space with the same metric. 
Moreover ¥(E) C E, since if x € E then (Fx)(to) = & and, for all t € /, 


(Fx) (t) — Col -| | glt,x(t)]dt| < Mo. 


Furthermore, if x}, x2 € E, then d(¥ x1, #x2) < Ldd(x1, x2), since for allt € J, 


(F-x1)(t) — (F x2) (0)| -| | {g[t, x1(t)] — gle, x2(t) de 


< Lo d(x1, x2). 


Since Ld < 1, the result now follows from Proposition 26. 
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Proposition 28 only guarantees the local existence of solutions, but this is in the 
nature of things. For example, ifn = 1, the unique solution of the differential equation 


dx /dt = x” 
such that x(f9) = € > 0 is given by 
x(t) = (1— (¢~ 160} "60. 


Thus the solution is defined only fort < fo-+¢9. 1 even though the differential equation 
itself has exemplary behaviour everywhere. 

To illustrate Proposition 28, take n = | and let E(t) be the solution of the (linear) 
differential equation 


dx/dt =x (3) 


which satisfies the initial condition E(0) = 1. Then E(t) is defined for |t| < R, 
for some R > 0. If |ct| < R/2 and x;(t) = E(t + 7), then x;(f) is the solution of 
the differential equation (3) which satisfies the initial condition x,(0) = E(t). But 
x2(t) = E(r)E(t) satisfies the same differential equation and the same initial condi- 
tion. Hence we must have x(t) = x2(t) for |t| < R/2, i.e. 


E(t+ 1) = E(t)E(t). (4) 
In particular, 
E(t)E(-t)=1, E(2t) = E(t)’. 


The last relation may be used to extend the definition of E(t), so that it is continuously 
differentiable and a solution of (3) also for |t| < 2R. It follows that the solution E(t) 
is defined for all tf € R and satisfies the addition theorem (4) for all t, 7 € R. 

It is instructive to carry through the method of successive approximations explicitly 
in this case. If we take x(t) to be the constant 1, then 


t 
x(t) = + [ xo(t)dt = 1+, 
0 


t 
x2(t) = 1+ f xi(t)dt =14+t4+17/2, 
0 


By induction we obtain, for every n > 1, 
xn(t) =l4+1407/2!4---42"/nl. 


Since x,(t) > E(t) asn — ov, we obtain for the solution E(t) the infinite series 
representation 


E(t) =14t4+r/2!+/3!4---, 
valid actually for every ¢t € R. In particular, 


e:= EQ) =14+141/2!4+1/3!+---=2.7182818.... 
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Of course E(t) = e! is the exponential function. We will now adopt the usual 
notation, but we remark that the definition of e’ as a solution of a differential equa- 
tion provides a meaning for irrational ¢, as well as a simple proof of both the addition 
theorem and the exponential series. 

The power series for e’ shows that 


e’>1+t>1  foreveryt > 0. 


Since e~' = (e')!, it follows that 0 < e’ < 1 for every t < 0. Thuse’ > 0 
for all t € R. Hence, by (3), e’ is a strictly increasing function. But e’ — +00 as 
t > +00 and e’ > 0 ast — —oo. Consequently, since it is certainly continuous, 
the exponential function maps the real line R bijectively onto the positive half-line 
Ry = {x © R: x > O}. For any x > 0, the unique t € R such that e’ = x is denoted 
by In x (the natural logarithm of x) or simply log x. 
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By extending the rational numbers to the real numbers, we ensured that every posi- 
tive number had a square root. By further extending the real numbers to the complex 
numbers, we will now ensure that all numbers have square roots. 

The first use of complex numbers, by Cardano (1545), may have had its origin in 
the solution of cubic, rather than quadratic, equations. The cubic polynomial 


f(x) =x? — 3px —2q 
has three real roots if d := q* — p* < O since then, for large X > 0, 


f(-X) <0, f(—p'*)>0,  f(p'*) <0, F(X) > 0. 


Cardano’s formula for the three roots, 


fx) =V¥q@t+vd)+V¥@-Va), 


gives real values, even though d is negative, because the two summands are conjugate 
complex numbers. This was explicitly stated by Bombelli (1572). It is a curious fact, 
first proved by Holder (1891), that if a cubic equation has three distinct real roots, then 
it is impossible to represent these roots solely by real radicals. 

Intuitively, complex numbers are expressions of the form a +ib, where a and b are 
real numbers and i = —1. But what is i? Hamilton (1835) defined complex numbers 
as ordered pairs of real numbers, with appropriate rules for addition and multiplication. 
Although this approach is similar to that already used in this chapter, and actually was 
its first appearance, we now choose a different method. 

We define a complex number to be a2 x 2 matrix of the form 
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where a and b are real numbers. The set of all complex numbers is customarily 
denoted by C. We may define addition and multiplication in C to be matrix addition 
and multiplication, since C is closed under these operations: if 


c d 
a= ("1 2): 


A+B =( ate ea AB =( ac — bd cere 


then 


—(b+d) atc —(ad+bc) ac—bd 


Furthermore C contains 


and A € C implies —A € C. 

It follows from the properties of matrix addition and multiplication that addition 
and multiplication of complex numbers have the properties (A2)-(A5), (M2)—(M4) 
and (AM1)-(AM2), with 0 and / as identity elements for addition and multiplication 
respectively. The property (M5) also holds, since if a and b are not both zero, and if 


a =a/(a’ +b’), b! =—-b/(a’ +b’), 


_ ab’ 
AN= o A 


is a multiplicative inverse of A. Thus C satisfies the axioms for a field. 
The set C also contains the matrix 


ft A 
*=\_1 Oo}? 


for which i = —J, and any A € C can be represented in the form 


then 


A=al +bi, 


where a,b € R. The multiples a/, where a € R, form a subfield of C isomorphic to 
the real field IR. By identifying the real number a with the complex number a/, we 
may regard R itself as contained in C. 

Thus we will now stop using matrices and use only the fact that C is a field con- 
taining R such that every z € C can be represented in the form 


zZ=xt+iy, 


where x, y € R andi € C satisfies i = —1. The representation is necessarily unique, 
since i ¢ R. We call x and y the real and imaginary parts of z and denote them by #z 
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and .¥z respectively. Complex numbers of the form iy, where y € R, are said to be 
pure imaginary. 

It is worth noting that C cannot be given the structure of an ordered field, since in 
an ordered field any nonzero square is positive, whereas i* + 17 = (1) +1 =0. 

It is often suggestive to regard complex numbers as points of a plane, the complex 
number z = x + iy being the point with coordinates (x, y) in some chosen system of 
rectangular coordinates. 

The complex conjugate of the complex number z = x + iy, where x, y € R, is the 
complex number z = x — /y. In the geometrical representation of complex numbers, 
z is the reflection of z in the x-axis. From the definition we at once obtain 


Bz =(Z+2)/2, %z=(Z—Z)/2i. 
It is easily seen also that 
fT O=2+0, TOS fi, 2=z; 


Moreover, z = z if and only if z € R. Thus the map z — Z is an ‘involutory auto- 
morphism’ of the field C, with the subfield R as its set of fixed points. It follows that 
5S =z, 

If z =x +iy, where x, y € R, then 


zz= (x t+iy)\Qe-iy) =x? +y°. 


Hence zz is a positive real number for any nonzero z € C. The absolute value |z| of 
the complex number z is defined by 


[O|}=0,  |z| = V(zz) ifz £0, 


(with the positive value for the square root). This agrees with the definition in §3 if 
Z =X is a positive real number. 
It follows at once from the definition that |zZ| = |z| for every z € C, and 
zg! =2/|z/? ifz £0. 
Absolute values have the following properties: for all z, w € C, 
(i) |0| = 9, |z| > 0 ifz #0; 
(i) [zw] = |z||w]; 
(iit) |z + wo] < [Z| + lw. 


The first property follows at once from the definition. To prove (ii), observe that 
both sides are non-negative and that 


|zw|7 = ZWZW = ZWZW = ZZWW = Iz|*|w/?. 
To prove (iii), we first evaluate |z + wl: 
Ic + wl? = (c+ w)(Z + ) = 2Z + (co + WZ) + WH = [z|? + 2A(zw) + |v)’. 


Since (zw) < |zw| = |z||w], this yields 
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Jz + wl? < [zP? + hello] + Jwl? = (lel + lw)’, 


and (iii) follows by taking square roots. 

Several other properties are consequences of these three, although they may also 
be verified directly. By taking z = w = | in (ii) and using (i), we obtain |1| = 1. By 
taking z = w = —1 in (11) and using (i), we now obtain | — 1| = 1. Taking w = —1 
and w = z~! in (ii), we further obtain 


l-zl=Izl, |zl=l\zi7! ifz 40. 
Again, by replacing z by z — w in (iii), we obtain 
lz] — lwll < lz — wl. 


This shows that |z| is a continuous function of z. In fact C is a metric space, with the 
metric d(z, w) = |z — w]. By considering real and imaginary parts separately, one 
verifies that this metric space is complete, i.e. every fundamental sequence is conver- 
gent, and that the Bolzano—Weierstrass property continues to hold, i.e. any bounded 
sequence of complex numbers has a convergent subsequence. 

It will now be shown that any complex number has a square root. If w = u + iv 
and z = x +/y, then 2=wis equivalent to 


x —yr=u, 2xy =v. 
Since 
Qe? + yy? = Gy?" + Oxy’, 
these equations imply 
wry JW? + 02). 
Hence 


r= {u +f (U2 + v?)} /2. 


Since the right side is positive if » A 0, x is then uniquely determined apart from sign 
and y = v/2x is uniquely determined by x. If v = 0, then x = +./u and y = 0 when 
u > 0;x =Oand y = +./(—u) when u < 0, andx = y = 0 whenu = 0. 

It follows that any quadratic polynomial 


q(z) = az? +bz+e, 
where a, b,c € C anda # 0, has two complex roots, given by the well-known formula 
z={-b+ V(b? -4ac)} /2a. 


However, much more is true. The so-called fundamental theorem of algebra asserts 
that any polynomial 
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f@) = agz" + az"! eb Gns 


where ao, 41,...,d4n € C,n > 1 and ao ¥ O, has a complex root. Thus by adjoining 
to the real field R a root of the polynomial z* + 1 we ensure that every non-constant 
polynomial has a root. Today the fundamental theorem of algebra is considered to be- 
long to analysis, rather than to algebra. It is useful to retain the name, however, as a 
reminder that our own pronouncements may seem equally quaint in the future. 

Our proof of the theorem will use the fact that any polynomial is differentiable, 
since sums and products of differentiable functions are again differentiable, and hence 
also continuous. We first prove 


Proposition 29 Let G C C be an open set and E a proper subset (possibly empty) of 
G such that each point of G has a neighbourhood containing at most one point of E. 
If f: G > Cis a continuous map which at every point of G\E is differentiable and 
has a nonzero derivative, then f (G) is an open subset of C. 


Proof Evidently G\E is an open set. We show first that f(G\£) is also an open set. 
Let ¢ € G\E. Then f is differentiable at ¢ and p = | f’(¢)| > 0. We can choose 6 > 0 
so that the closed disc B = {z € C: |z — ¢| < 6} contains no point of E, is contained 
in G and 


lf) -fO©|= ple-Cl/2 forevery ze B. 


In particular, if S = {z € C: |z — ¢| = 6} is the boundary of B, then 


|\f(z) — f@©| = pd/2 foreveryze S. 


Choose w € C so that |w — f(©)| < p0d/4 and consider the minimum in the com- 
pact set B of the continuous real-valued function ¢(z) = | f(z) — w]. On the boundary 
S we have 


P(2) 2 IF @) — FON—-IFO@ — wl 2 pd/2 — pd/4 = pd/4. 


Since #(¢) < p0d/4, it follows that ¢ attains its minimum value in B at an interior 
point zo. Since zo ¢ E, we can take 


z= 20 —ALf’ (zo) '{f (zo) — wv}, 


where h > 0 is so small that |z — ¢| < 6. Then 


f(@) — w= fo) — wt f'(z0)(z — 20) + (A) = (1 — A) fF Zo) — w} + off). 
If f(zo) ~ w then, for sufficiently small h > 0, 


If) — wv] s A—h/2)|f Go) — w| < Ifo) — vI, 


which contradicts the definition of zg. We conclude that f(zo) = w. Thus f(G\E) 
contains not only f(C), but also an open disc {w € C: |w — f(©)| < pd/4} surround- 
ing it. Since this holds for every ¢ € G\E, it follows that f(G\£) is an open set. 
Now let ¢ € E and assume that f(G) does not contain any open neighbourhood of 
ow := f(C). Then f(z) 4 @ for every z € G\E. Choose 6 > 0 so small that the closed 
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disc B = {z € C: |z—¢| < 6} is contained in G and contains no point of E except ¢. If 
S = {z eC: |z—¢| = 0} is the boundary of B, there exists an open disc U with centre 
@ that contains no point of f(S). It follows that if A = {z € C: 0 < |z—¢| < od} 
is the annulus B\(S U {¢}), then U\{@} is the union of the disjoint nonempty open 
sets UM {C\ f(B)} and UM f(A). Since U\{@} is a connected set (because it is 
path-connected), this is a contradiction. 


From Proposition 29 we readily obtain 
Theorem 30 /f 
f@ =z +ayz" | 4+--- +a, 


is a polynomial of degree n > 1 with complex coefficients a, ..., dn, then f(¢) = 0 
for some € €C. 


Proof Since 
f()/z" =1tai)/z+-+-+a)/z" 2 1 as |z| > ~w, 
we can choose R > 0 so large that 
|\f(—)| > |f)| forall z e¢ C such that |z| = R. 


Since the closed disc D = {z € C: |z| < R} is compact, the continuous function | f (z)| 
assumes its minimum value in D at a point ¢ in the interior G = {z € C: |z| < Ry}. 
The function f(z) is differentiable in G and the set E of all points of G at which 
the derivative f’(z) vanishes is finite. (In fact E contains at most n — | points, by 
Proposition II.15.) Hence, by Proposition 29, f(G) is an open subset of C. Since 
Lf(z)| = |f(©)| for all z € G, this implies f(¢) = 0. 


The first ‘proof’ of the fundamental theorem of algebra was given by d’ Alembert 
(1746). Assuming the convergence of what are now called Puiseux expansions, he 
showed that if a polynomial assumes a value w 0, then it also assumes a value w’ 
such that |w’| < |w|. A much simpler way of reaching this conclusion, which required 
only the existence of k-th roots of complex numbers, was given by Argand (1814). 
Cauchy (1820) gave a similar proof and, with latter-day rigour, it is still reproduced 
in textbooks. The proof we have given rests on the same general principle, but uses 
neither the existence of k-th roots nor the continuity of the derivative. These may be 
called differential calculus proofs. 

The basis for an algebraic proof was given by Euler (1749). His proof was com- 
pleted by Lagrange (1772) and then simplified by Laplace (1795). The algebraic proof 
starts from the facts that IR is an ordered field, that any positive element of R has a 
square root in R and that any polynomial of odd degree with coefficients from R has 
a root in R. It then shows that any polynomial of degree n > 1 with coefficients from 
C = R(@), where i? = —1, has a root in C by using induction on the highest power of 
2 which divides n. 

Gauss (1799) objected to this proof, because it assumed that there were ‘roots’ and 
then proved that these roots were complex numbers. The difficulty disappears if one 
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uses the result, due to Kronecker (1887), that a polynomial with coefficients from an 
arbitrary field K decomposes into linear factors in a field L which is a finite extension 
of K. This general result, which is not difficult to prove, is actually all that is required 
for many of the previous applications of the fundamental theorem of algebra. 

It is often said that the first rigorous proof of the fundamental theorem of algebra 
was given by Gauss (1799). Like d’Alembert, however, Gauss assumed properties of 
algebraic curves which were unknown at the time. The gaps in this proof of Gauss 
were filled by Ostrowski (1920). 

There are also topological proofs of the fundamental theorem of algebra, e.g. using 
the notion of topological degree. This type of proof is intuitively appealing, but not 
so easy to make rigorous. Finally, there are complex analysis proofs, which depend 
ultimately on Cauchy’s theorem on complex line integrals. (The latter proofs are more 
closely related to either the differential calculus proofs or the topological proofs than 
they seem to be at first sight.) 

The exponential function e* may be defined, for any complex value of z, as the sum 
of the everywhere convergent power series 


DVoefnlal4e zt 2/W+/3lt--. 
n>0 


It is easily verified that w(z) = e* is a solution of the differential equation dw/dz = w 
satisfying the initial condition w(0) = 1. 
For any ¢ € C, put g(z) = e¢ “e*. Differentiating by the product rule, we obtain 


g'(z) = —e6 *e? + ee? = 0. 


Since this holds for all z € C, g(z) is a constant. Thus g(z) = g(0) = eS. Replacing 
¢ by ¢ +z, we obtain the addition theorem for the exponential function: 


ee =et* forallz,c eC. 


In particular, e~*e* = | and hence e* # 0 for every z € C. 
The power series for e* shows that, for any real y, e~’” is the complex conjugate 
of e’” and hence 


le! |? = ed ewly se 
It follows that, for all real x, y, 
je"F| = Jefe? | = e*. 


The trigonometric functions cos z and sin z may be defined, for any complex value 
of z, by the formulas of Euler (1740): 


cosz = (e +e7%)/2, sing = (e” — ee") /2i. 
It follows at once that 


e* =cosz+isinz, 
cos0O=1, sin0=O0O, 


cos(—z) =cosz, sin(—z) = —sinz, 
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and the relation e!“e~'~ = 1 implies that 
cos” z + sin? z = 1. 
From the power series for e* we obtain, for every z € C, 


cosz = > (—1)"2""/Qn)! = 127/21 +24/41—- +, 


n>0 


sing = D0(-1)"2""*"/Qn+4 Il=c- 3/314 0/S!— +. 


n>0 
From the differential equation we obtain, for every z € C, 
d(cosz)/dz=—sinz, d(sinz)/dz =cosz. 
From the addition theorem we obtain, for all z, ¢ € C, 


cos(z + ¢) = coszcos¢ — sinzsin¢, 
sin(z + ¢) = sinzcos¢ + coszsin¢. 


We now consider periodicity properties. By the addition theorem for the exponen- 
tial function, e@+” = e? if and only if e” = 1. Thus the exponential function has period 
h if and only if e” = 1. Since e” = 1 implies h = ix for some real x, and since cos x 
and sin x are real for real x, the periods correspond to those real values of x for which 


cosx=1, sinx =0. 


In fact, the second relation follows from the first, since cos? x + sin? x = 1. 
By bracketing the power series for cos x in the form 


cose = (l — x7 /214 7/4) — C1 27/7 8x26! — 1 = 27/11 +12 )e"9 101 — <- 


and taking x = 2, we see that cos2 < 0. Since cos0 = 1 and cos x is a continuous 
function of x, there is a least positive value ¢ of x such that cos € = 0. Then sin? C= 1. 
In fact sin € = 1, since sinO = 0 and sin’ x = cosx > 0 for0 < x < &. Thus 


O<sinx <1 forO<x <é 
and 
e'© =cosé +isiné =i. 
ni/2 


As usual, we now write 7 = 2¢. From e = 1, we obtain 


Thus the exponential function has period 27 i. It follows that it also has period 2nzi, 
for every n € Z. We will show that there are no other periods. 
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Suppose e’*” = 1 for some x’ € R and choose n € Z so thatn < x//2n <n+1. 
If x = x’ — 2nz, then e’* = landO <x < 2a. Ifx #0, then0 < x/4 < w/2 and 
hence 0 < sinx/4 < 1. Thus e'*/4 ¢ +1, +i. But this is a contradiction, since 


(ei*)* = el* = 1. 


We show next that the map x — e!* maps the interval 0 < x < 27 bijectively onto 
the unit circle, i.e. the set of all complex numbers w such that |w| = 1. We already 
know that Je“| = lifx € R. Ife’ =e’, whereO < x < x’ < 2m, thene!@’-") = 1], 
Since 0 < x’ — x < 27, this implies x’ = x. 

It remains to show that if u, v € R and u2 + v2 = 1, then 


u=cosx, v=sinx 


for some x such thatO < x < 2z.Ifu,v > 0,thenalsou,v < 1. Hence u = cos x for 
some x such that 0 < x < 2/2. It follows that v = sinx, since sin? x = 1 — u? = v? 
and sinx > 0. The other possible sign combinations for u,v may be reduced to the 


case u,v > O by means of the relations 
sin(x + 2/2) =cosx, cos(x +2/2) =—sinx. 


If z is any nonzero complex number, then r = |z| > O and |z/r| = 1. It follows 
that any nonzero complex number z can be uniquely expressed in the form 


zZ=re’, 


where r, 0 are real numbers such that r > 0 and 0 < @ < 2z. We call these r, 0 the 
polar coordinates of z and @ the argument of z. If z = x +iy, where x, y € R, then 
r= V(x? + y?) and 


x=rcosé, y=rsiné. 


Hence, in the geometrical representation of complex numbers by points of a plane, r 
is the distance of z from O and @ measures the angle between the positive x-axis and 
the ray OO: 

We now show that the exponential function assumes every nonzero complex 
value w. Since |w| > 0, we have |w| = e* for some x € R. If w’ = w/|wI, then 
|w’| = 1 and so w’ = e’” for some y € R. Consequently, 


w= |wlw’ = ere = et”, 


It follows that, for any positive integer n, a nonzero complex number w has n 
distinct n-th roots. In fact, if w = e*, then w has the distinct n-th roots 


Ce = Co*(k =0,1,...,n—1), 


where ¢ = e*/" and@ = e7*'/", In the geometrical representation of complex numbers 
by points of a plane, the n-th roots of w are the vertices of an n-sided regular polygon. 
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It remains to show that z has its usual geometric significance. Since the continu- 
ously differentiable function z(t) = e'’ describes the unit circle as t increases from 0 
to 27, the length of the unit circle is 


2a 
z= f \z'(t)|dt. 
0 


But |z’(t)| = 1, since z(t) = ie’, and hence L = 2z. 

In a course of complex analysis one would now define complex line integrals, 
prove Cauchy’s theorem and deduce its numerous consequences. The miracle is that, 
if D = {z € C: |z| < p} is a disc with centre the origin, then any differentiable 
function f : D — C can be represented by a power series, 


f(@) =coferztoaz7+-:-, 


which is convergent for |z| < p. It follows that, if f vanishes at a sequence of dis- 
tinct points converging to 0, then it vanishes everywhere. This is the basis for analytic 
continuation. 

A complex-valued function f is said to be holomorphic at a € C if, in some 
neighbourhood of a, it can be represented as the sum of a convergent power series (its 
“Taylor’ series): 


f(z) =coter(z—a) +.02(z —a)*+---. 


It is said to be meromorphic at a € C if, for some integer n, it can be represented near 
a as the sum of a convergent series (its ‘Laurent’ series): 


f() =e —a) "+a —a) "t+ e(z-a) "VP +--- 


If co £ 0, then (z — a) f’(z)/f(z) ~ —nasz— a.Ifalson > 0 we say that a isa 
pole of f of order n with residue cn—,. If n = 1, the residue is co and the pole is said 
to be simple. 

Let G be a nonempty connected open subset of C. From what has been said, if 
f: G > C is differentiable throughout G, then it is also holomorphic throughout G. 
If f,; and f2 are holomorphic throughout G and f is not identically zero, then the 
quotient f = f\/f2 is meromorphic throughout G. Conversely, it may be shown that 
if f is meromorphic throughout G, then f = fi/fo for some functions f|, f2 which 
are holomorphic throughout G. 

The behaviour of many functions is best understood by studying them in the 
complex domain, as the exponential and trigonometric functions already illustrate. 
Complex numbers, when they first appeared, were called ‘impossible’ numbers. They 
are now indispensable. 


6 Quaternions and Octonions 


Quaternions were invented by Hamilton (1843) in order to be able to ‘multiply’ points 
of 3-dimensional space, in the same way that complex numbers enable one to multiply 
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points of a plane. The definition of quaternions adopted here will be analogous to our 
definition of complex numbers. 
We define a quaternion to be a2 x 2 matrix of the form 


where a and b are complex numbers and the bar denotes complex conjugation. The set 
of all quaternions will be denoted by H. We may define addition and multiplication in 
H to be matrix addition and multiplication, since H is closed under these operations. 
Furthermore H contains 


and A € Himplies —A e H. 

It follows from the properties of matrix addition and multiplication that addition 
and multiplication of quaternions have the properties (A2)—(A5) and (M3)—(M4), with 
0 and / as identity elements for addition and multiplication respectively. However, 
(M2) no longer holds, since multiplication is not always commutative. For example, 


0 1\/0 i 0 7 0 1 
= ope ape a Oo) lt, «0 
On the other hand, there are now two distributive laws: for all A, B, C € H, 


A(B+C)=AB+AC, (B+C)A=BA+CA. 


It is easily seen that A € H is in the centre of H, ic. AB = BA for every B € H, 
if and only if A = A/ for some real number 4. Since the map 2 > A/ preserves sums 
and products, we can regard R as contained in H by identifying the real number 4 with 
the quaternion //. 

We define the conjugate of the quaternion 


to be the quaternion 


It is easily verified that 


Furthermore, 


AA=AA=n(A), A+A=rt(A), 
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where the norm n(A) and trace t(A) are both real: 
n(A)=aa+bb, t(A)=a+ta4. 


Moreover, n(A) > Oif A ¥ 0. It follows that any quaternion A # 0 has a multiplica- 
tive inverse: if A~! = n(A)—!A, then 


A‘A=AA'=1. 
Norms and traces have the following properties: for all A, B € H, 


1(A) = 1(A), 
n(A) = n(A), 
t(A+ B) =t(A)+1¢(B), 
n(AB) = n(A)n(B). 


Only the last property is not immediately obvious, and it can be proved in one line: 
n(AB) = ABAB = BAAB = n(A)BB = n(A)n(B). 
Furthermore, for any A € H we have 
A? — 1(A)A + n(A) =0, 


since the left side can be written in the form A? —(A+A)A+AA. (The relation is actu- 
ally just a special case of the “‘Cayley—Hamilton theorem’ of linear algebra.) It follows 
that the quadratic polynomial x*+ 1 has not two, but infinitely many quaternionic roots. 


If we put 
0 1 0 i i O 
“(4 a) = 0) €=( 4): 


P=/=kK?=-1, 
IJ=K=-JI, JK=I1=-KJ, KI=J=-IK. 


then 


Moreover, any quaternion A can be uniquely represented in the form 
A=ajota;jl+a.J/+a3K, 

where ao, ..., @3 € R. In fact this is equivalent to the previous representation with 
a=ajtia3, b=a,+iaz. 

The corresponding representation of the conjugate quaternion is 


A=ag—a,l —a2J —a3k. 


Hence A = A if and only if a, = a2 = a3 = Oand A = —A if and only if ap = 0. 
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A quaternion A is said to be pure if A = —A. Thus any quaternion can be uniquely 
represented as the sum of a real number and a pure quaternion. 

It follows from the multiplication table for the units 7, J, K that A= ap +a,7 + 
a2J + a3K has norm 


n(A) = an +a? + as + ie. 
Consequently the relation n(A)n(B) = n(AB) may be written in the form 
(a3 + a? + 03 +.03)(B3 + BP + B3 + B32) =e ty? +93 +93, 


where 


Yo = a0fo — a1 fi — a2f2 — a3 fs, 
yi = a0fhi1 + a1 fo + a2f3 — a3f2, 
72 = a0f2 — a1f3 + a2ho + 43h, 
73 = a0f3 + ai fo — a2f) + as£o. 


This *4-squares identity’ was already known to Euler (1770). 

An important application of quaternions is to the parametrization of rotations in 
3-dimensional space. In describing this application it will be convenient to denote 
quaternions now by lower case letters. In particular, we will write i, j,k in place 
of 1, J, K. 

Let u be a quaternion with norm n(u) = 1, and consider the mapping 7: H > H 
defined by 


Tx =uxu, 


Evidently 


T(ixaty)=Tx+Ty, 
T (xy) = (Tx)(Ty), 
TUx)=A4Tx ifdeR. 


1 


Moreover, since u-* = u, 
fr STs. 
It follows that 
n(Tx) =n(x), 
since 
n(Tx) = TxTx = TxTx = T (xx) =n(x)T1 =n(2). 
Furthermore, T maps pure quaternions into pure quaternions, since x = —x 
implies 


x=Tx=-Tx. 
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If we write 
X= Ci +Oj + ak, 
then 
Tx =y=mi + noj + sk, 
where 7, = ee Buvév for some By» € R. Since 
m+m+n3=G+E+, 


the matrix V = (£,,,) is orthogonal: Voiev'. 
Thus with every quaternion u with norm | there is associated a 3 x 3 orthogonal 
matrix V = (B,»). Explicitly, if 


u=ago+ ai +a2j +.a3k, 
where 
ad +0? +02 +42 = 1, 


then 


2. 2 2 2 
Bi =ap +aj—a5—43, Bi2=2(aia2—a0a3), B13 = 2(a1a3 + a0a2), 


Bor = 2(a1a2 +4003), f2=a5 -—a, +05 —a3, fo3 = 2(a203 — aoa), 


$31 = 2(a1a3—ao02), $32 =2(a203+ 4001), B33 = a9 — af — a5 +45. 


This parametrization of orthogonal transformations was first discovered by Euler(1770). 
We now consider the dependence of V on u, and consequently write V (uw) in place 
of V. Since 


uju2Xx(uju2) | = uy(urxu5')uz', 
we have 
V (uyu2) = V(u)V (ua). 


Thus the map u — V/(u) is a ‘homomorphism’ of the multiplicative group of all 
quaternions of norm | into the group of all 3 x 3 real orthogonal matrices. In particu- 
lar, V(a@) = V(u)7!. 

We show next that two quaternions u1, v2 of norm | yield the same orthogonal 
matrix if and only if w2 = +u,. Putu = uy Uy. Then uyxuy! = U2XU5 | if and only 
if ux = xu. This holds for every pure quaternion x if and only if uw is real, i.e. if and 
only if uw = +1, since n(u) = 1. 

The question arises whether all 3 x 3 orthogonal matrices may be represented in 
the above way. It follows readily from the preceding formulas for £,,, that the orthog- 
onal matrix —J cannot be so represented. Consequently, if an orthogonal matrix V is 
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represented, then —V is not. On the other hand, suppose wu is a pure quaternion, so that 
ao = 0. Then ux + xu = ux + xu is real, and given by 


ux + xu = —2(a1o1 + A262 + 4363) = 2(u, x), 
with the notation of §10 for inner products in R>. It follows that 
y= uxu = 2(u,x)u— Xx. 


But the mapping x — x — 2(u, x)u is a reflection in the plane orthogonal to the unit 
vector u. Hence, for every reflection R, —R is represented. It may be shown that every 
orthogonal transformation of R? is a product of reflections. (Indeed, this is a special 
case of a more general result which will be proved in Proposition 17 of Chapter VII.) 
It follows that an orthogonal matrix V is represented if and only if V is a product of 
an even number of reflections (or, equivalently, if and only if V has determinant 1, as 
defined in Chapter V, 81). 

Since, by our initial definition of quaternions, the quaternions of norm | are just 
the 2 x 2 unitary matrices with determinant 1, our results may be summed up (cf. 
Chapter X, §8) by saying that there is a homomorphism of the special unitary group 
SU2(C) onto the special orthogonal group SO3(R), with kernel {+/}. (Here ‘special’ 
signifies ‘determinant |’.) 

Since the quaternions of norm | may be identified with the points of the unit sphere 
S? in R* it follows that, as a topological space, SO3(R) is homeomorphic to S? with 
antipodal points identified, i.e. to the projective space P>(R). Similarly (cf. Chapter X, 
§8), the topological group $U2(C) is the simply-connected covering space of the topo- 
logical group SO3(R). 

Again, by considering the map T: H — H defined by Tx = vxu7!, where u, v 
are quaternions with norm 1, it may be seen that that there is a homomorphism of the 
direct product SU2(C) x SU2(C) onto the special orthogonal group SO4(R) of 4 x 4 
real orthogonal matrices with determinant 1, the kernel being {+(/, /)}. 

Almost immediately after Hamilton’s invention of quaternions Graves (1844), in a 
letter to Hamilton, and Cayley (1845) invented ‘octonions’, also known as ‘octaves’ or 
“Cayley numbers’. We define an octonion to be an ordered pair (aj, a2) of quaternions, 
with addition and multiplication defined by 


(a1, 42) + (b1, b2) = (a, + 51, a2 + 52), 
(a1, a2) - (bi, bx) = (aby — b2a2, boa; + ab). 


Then the set O of all octonions is a commutative group under addition, i.e. the laws 
(A2)-(A5) hold, with 0 = (0, 0) as identity element, and multiplication is both left and 
right distributive with respect to addition. The octonion / = (1, 0) is a two-sided iden- 
tity element for multiplication, and the octonion ¢ = (0, 1) has the property e7 = —/. 

It is easily seen that a € O is in the centre of O, i.e. af = fa for every f € O, if 
and only if a = (c, 0) forsomec € R. 

Since the map a > (a, 0) preserves sums and products, we may regard H as con- 
tained in O by identifying the quaternion a with the octonion (a, 0). This shows that 
multiplication of octonions is in general not commutative. It is also in general not even 
associative; for example, 
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It is for this reason that we defined octonions as ordered pairs, rather than as matrices. 
It should be mentioned, however, that we could have used precisely the same con- 
struction to define complex numbers as ordered pairs of real numbers, and quaternions 
as ordered pairs of complex numbers, but the verification of the associative law for 
multiplication would then have been more laborious. 

Although multiplication is non-associative, O does inherit some other properties 
from H. If we define the conjugate of the octonion a = (dj, a2) to be the octonion 
a@ = (@, —az), then it is easily verified that 


Furthermore, 
aa=aa=n(a), 
where the norm n(a) = a\qaj, + a2az Is real. Moreover n(a) > 0 if a 4 O, and 
n(a) =n(a). 
It will now be shown that if a, 8 € O anda 0, then the equation 


a= f 


has a unique solution € € O. Writing a = (aj, a2), 8 = (b1, b2) and € = (x1, x2), we 
have to solve the simultaneous quaternionic equations 


Xa, — a7x2 = by, 


a2x| + x2q) = bp. 


If we multiply the second equation on the right by a; and replace x ;a; by its value 
from the first equation, we get 


n(a)x2 = bra, — anh. 


Similarly, if we multiply the first equation on the right by ay and replace x2a7 by its 
value from the second equation, we get 


n(a)x, = bya, + agbo. 
It follows that the equation €a = f has the unique solution 
€=n(a)|Ba. 


Since the equation a7 = f is equivalent to 7a = , it has the unique solution 
n = n(a)~!aB. Thus O is a division algebra. It should be noted that, since O is non- 
associative, it is not enough to verify that every nonzero element has a multiplicative 
inverse. 

It follows from the preceding discussion that, for all a, B € O, 


(Baja =n(a)b = a(aB). 
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Consequently the norm is multiplicative: for alla, 6 € O, 


n(ap) = n(a)n(f). 


For, putting y = af, we have 


n(y)a = (ay) 7 = G(af))7 = n@)PF = n(a)B (BA) = n(a)n(f)a. 


This establishes the result when a 4 0, and when a = 0 it is obvious. 
Every a € O has a unique representation a = a, + a2, where a}, a2 € H, and 
hence a unique representation 


a=cotcitcj+ok+cq4e+csie +ceoje+c7ke, 
where co, ...,¢7 € R. Since @ = aj — aze and n(a) = a,q_ + a2az, it follows that 


a=co— cli — coj — 03k — cae — csie — cg je — c7ke 


and 
n(a) = ch ++. +2. 


Consequently the relation n(a)n(f) = n(afP) may be written in the form 
(a+. +A @+-- +a) =ee+.-- +e, 


where e; = pam ye pijkcj dx for some real constants p;;,¢ which do not depend on 
the c’s and d’s. An ‘8-squares identity’ of this type was first found by Degen (1818). 


7 Groups 


A nonempty set G is said to be a group if a binary operation g, i.e. a mapping 
g: Gx G— G, is defined with the properties 


(i) g(g(a, b), c) = v(a, v(b, c)) for all a, b, c € G; (associative law) 
(ii) there exists e € G such that y(e, a) = a for every a € G; (identity element) 
(iii) for each a € G, there exists a~! € G such that g(a! , 2) = e.(inverse elements) 


If, in addition, 
(iv) g(a, b) = o(b, a) for all a, b € G,(commutative law) 


then the group G is said to be commutative or abelian. 

For example, the set Z of all integers is a commutative group under addition, i.e. 
with g(a, b) = a+b, with 0 as identity element and —a as the inverse of a. Similarly, 
the set Q* of all nonzero rational numbers is a commutative group under multiplica- 
tion, i.e. with g(a, b) = ab, with 1 as identity element and a~! as the inverse of a. 

We now give an example of a noncommutative group. The set .“4 of all bijective 
maps f: A > A of a nonempty set A to itself is a group under composition, i.e. with 
v(a, b) = ao b, with the identity map i, as identity element and the inverse map f—! 
as the inverse of f. If A contains at least 3 elements, then .%4 is a noncommutative 
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group. For suppose a, b, c are distinct elements of A, let f: A — A be the bijective 
map defined by 


f@M=b, fib)=a, fx)=x ifx #a,b, 


and let g: A > A be the bijective map defined by 
g(aj=c, glc)=a, gix)=x ifx a,c. 


Then fog #go f, since (f 0 g)(a) =c and (go f)(a) =b. 

For arbitrary groups, instead of g(a, b) we usually write a - b or simply ab. For 
commutative groups, instead of g(a, b) we often write a + b. 

Since, by the associative law, 


(ab)c = a(be), 


we will usually dispense with brackets. 

We now derive some simple properties possessed by all groups. By (iii) we have 
a~'a = e. In fact also aa~! = e. This may be seen by multiplying on the left, by the 
inverse of a7}, the relation 


1 


By (ii) we have ea = a. It now follows that also ae = a, since 
ae =aa_'a =ea. 


For all elements a,b of the group G, the equation ax = b has the solution 
x = a7'b and the equation ya = b has the solution y = ba~!. Moreover, these 
solutions are unique. For from ax = ax’ we obtain x = x’ by multiplying on the left 
by a~!, and from ya = ya we obtain y = y’ by multiplying on the right by a7!. 

In particular, the identity element e is unique, since it is the solution of ea = a, 
and the inverse a~! of a is unique, since it is the solution of a~!a = e. It follows that 
the inverse of a7! is a and the inverse of ab is b~!a7!. 

As the preceding argument suggests, in the definition of a group we could have re- 
placed left identity and left inverse by right identity and right inverse, i.e. we could have 
required ae = a and aat= e, instead of ea = a and ata=e. (However, left iden- 
tity and right inverse, or right identity and left inverse, would not give the same result.) 

If H, K are nonempty subsets of a group G, we denote by H K the subset of G 
consisting of all elements hk, where h € H andk e€ K. If L is also a nonempty subset 
of G, then evidently 


(HK)L = H(KL). 


A subset H of a group G is said to be a subgroup of G if it is a group under the 
same group operation as G itself. A nonempty subset H is a subgroup of G if and only 
if a,b € H implies ab! € H. Indeed the necessity of the condition is obvious. It is 
also sufficient, since it implies first e = aa! € Handthenb-! = eb"! € H. (The 
associative law in H is inherited from G.) 
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We now show that a nonempty finite subset H of a group G is a subgroup of G if 
it is closed under multiplication only. For, if a ¢ H, then ha € H for allh e€ H. Since 
#7 is finite and the mapping h — ha of H into itself is injective, it is also surjective by 
the pigeonhole principle (Corollary 1.6). Hence ha = a for some h € H, which shows 
that H contains the identity element of G. It now further follows that ha = e for some 
h € H, which shows that H 1s also closed under inversion. 

A group is said to be finite if it contains only finitely many elements and to be of 
order n if it contains exactly n elements. 

In order to give an important example of a subgroup we digress briefly. Let n be 
a positive integer and let A be the set {1, 2,...,7} with the elements in their natural 
order. Since we regard A as ordered, a bijective map a: A — A will be called a per- 
mutation. The set of all permutations of A is a group under composition, the symmetric 
group Sy. 

Suppose now that n > 1. An inversion of order induced by the permutation a@ is a 
pair (i, 7) with i < j for which a(i) > a(j). The permutation a is said to be even or 
odd according as the total number of inversions of order is even or odd. For example, 
the permutation {1,2,3,4,5} > {3,5,4, 1, 2} is odd, since there are2+3+4+2=7 
inversions of order. 

The sign of the permutation a is defined by 


sgn(a) = 1 or —1 according as a is even or odd. 


Evidently we can write 


sen(a)= [] {a(j)-a@)}/G-9, 


l<i<j<n 


from which it follows that 


sgn(af) = sgn(a)sgn(f). 


Since the sign of the identity permutation is 1, this implies 
sgn(a—!) = sgn(a). 


Thus sgn(p~!ap) = sgn(a) for any permutation p of A, and so sgn(a) is actually 
independent of the ordering of A. 

Since the product of two even permutations is again an even permutation, the even 
permutations form a subgroup of .%,, the alternating group &,. The order of .%, is 
n!/2. For let t be the permutation {1,2,3,...,2} — {2,1,3,...,m}. Since there is 
only one inversion of order, t is odd. Since rT is the identity permutation, a permuta- 
tion is odd if and only if it has the form az, where a is even. Hence the number of odd 
permutations is equal to the number of even permutations. 

It may be mentioned that the sign of a permutation can also be determined without 
actually counting the total number of inversions. In fact any a € .%, may be written as 
a product of ov disjoint cycles, and @ is even or odd according as n — 0 is even or odd. 

We now return to the main story. Let H be a subgroup of an arbitrary group G and 
let a, b be elements of G. We write a ~, b if ba~! € H. We will show that this is an 
equivalence relation. 
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The relation is certainly reflexive, since e € H. It is also symmetric, since if c = 
ba! € H, thenc7! = ab~! € H. Furthermore it is transitive, since if ba~! € H and 
cb~! € H, then also ca! = (cb7!|)(ba7!) € H. 

The equivalence class which contains a is the set Ha of all elements ha, where 
h € H. We call any such equivalence class a right coset of the subgroup H, and any 
element of a given coset is said to be a representative of that coset. 

It follows from the remarks in §0 about arbitrary equivalence relations that, for any 
two cosets Ha and Ha’, either Ha = Ha’ or HaN Ha’ = @. Moreover, the distinct 
right cosets form a partition of G. 

If H is a subgroup of a finite group G, then H is also finite and the number of 
distinct right cosets is finite. Moreover, each right coset Ha contains the same number 
of elements as H, since the mapping h — ha of H to Ha is bijective. It follows that 
the order of the subgroup H divides the order of the whole group G, a result usually 
known as Lagrange’s theorem. The quotient of the orders, i.e. the number of distinct 
cosets, is called the index of H in G. 

Suppose again that H is a subgroup of an arbitrary group G and that a, b € G. By 
writing a ~; b if a~'b € H, we obtain another equivalence relation. The equivalence 
class which contains a is now the set aH of all elements ah, where h € H. We call 
any such equivalence class a left coset of the subgroup H. Again, two left cosets either 
coincide or are disjoint, and the distinct left cosets form a partition of G. 

When are the two partitions, into left cosets and into right cosets, the same? Evi- 
dently Ha = aH for every a € G if and only if a~!'Ha = H for every a € G or, 
since a may be replaced by a7!, if and only if a~'ha € H for every h € H and every 
a € G. Asubgroup H which satisfies this condition is said to be ‘invariant’ or normal. 

Any group G obviously has two normal subgroups, namely G itself and the subset 
{e} which contains only the identity element. A group G is said to be simple if it has 
no other normal subgroups and if these two are distinct (i.e., G contains more than one 
element). 

We now show that if H is a normal subgroup of a group G, then the collection of 
all cosets of H can be given the structure of a group. Since Ha = aH and HH = H, 
we have 


(Ha)(Hb) = H(Ha)b = Hab. 


Thus if we define the product Ha - Hb of the cosets Ha and Hb to be the coset Hab, 
the definition does not depend on the choice of coset representatives. Clearly multipli- 
cation of cosets is associative, the coset H = He is an identity element and the coset 
Ha! isan inverse of the coset Ha. The new group thus constructed is called the factor 
group or quotient group of G by the normal subgroup H, and is denoted by G/H. 

A mapping f: G > G’ of a group G into a group G’ is said to be a (group) 
homomorphism if 


f (ab) = f(a) f(b) foralla,be G. 


By taking a = b = e, we see that this implies that f(e) = e’ is the identity element 
of G’. By taking b = a™!, it now follows that f(a7') is the inverse of f(a) in G’. 
Since the subset f(G) of G’ is closed under both multiplication and inversion, it is a 
subgroup of G’. 
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If g: G’ — G” is a homomorphism of the group G’ into a group G”, then the 
composite map go f: G > G” is also a homomorphism. 

The kernel of the homomorphism f is defined to be the set N of all a € G such 
that f(a) = é’ is the identity element of G’. The kernel is a subgroup of G, since if 
a é Nandb e€ N,thenab € N anda~! e€ N. Moreover, it is a normal subgroup, 
sincea € N andc € Gimply c7!ac € N. 

For any a € G, puta’ = f(a) € G’. The coset Na is the set of all x € G such 
that f(x) = a’, and the map Na —> a’ is a bijection from the collection of all cosets 
of N to f(G). Since f is a homomorphism, Nab is mapped to a’b’. Hence the map 
Na — a’ isa homomorphism of the factor group G/N to f(G). 

A mapping f: G > G’ of a group G into a group G’ is said to be a (group) isomor- 
phism if it is both bijective and a homomorphism. The inverse mapping f—! : G’ > G 
is then also an isomorphism. (An automorphism of a group G is an isomorphism of G 
with itself.) 

Thus we have shown that, if f: G — G’ is ahomomorphism of a group G into a 
group G’, with kernel N, then the factor group G/N is isomorphic to f(G). 

Suppose now that G is an arbitrary group and a any element of G. We have 
already defined a~!, the inverse of a. We now inductively define a”, for any integer n, 
by putting 


a =e, a =a, 


a" =a(a""!), a" =a \(a!)""'_ ifn> 1. 


It is readily verified that, for all m,n € Z, 


m on m+n myn mn 
a“a® =a™™, @™y =a™. 


The set (a) = {a”: n € Z} is acommutative subgroup of G, the cyclic subgroup gen- 
erated by a. Evidently (a) contains a and is contained in every subgroup of G which 
contains a. 

If we regard Z as a group under addition, then the mapping n — a” is a homomor- 
phism of Z onto (a). Consequently (a) is isomorphic to the factor group Z/N, where 
N is the subgroup of Z consisting of all integers n such that a” = e. Evidently 0 € N, 
and n e€ N implies —n € N. Thus either N = {0} or N contains a positive integer. 
In the latter case, let s be the least positive integer in N. By Proposition 14, for any 
integer n there exist integers g, r such that 


n=qs+r, O<r<s. 


Ifn € N, then alsor =n —qs € N and hence r = 0, by the definition of s. It follows 
that N = sZ is the subgroup of Z consisting of all multiples of s. Thus either (a) 
is isomorphic to Z, and is an infinite group, or (a) is isomorphic to the factor group 
Z/sZ, and is a finite group of order s. We say that the element a itself is of infinite 
order if (a) is infinite and of order s if (a) is of order s. 

It is easily seen that in a commutative group the set of all elements of finite order 
is a subgroup, called its torsion subgroup. 

If S is any nonempty subset of a group G, then the set (S) of all finite products 


aj'a5'---a,", where n € N, aj € S and ej = +1, is a subgroup of G, called the 
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subgroup generated by S. Clearly S C (S) and (S) is contained in every subgroup of 
G which contains S. 

Two elements a, b of a group G are said to be conjugate if b = x~!ax for some 
x € G. It is easy to see that conjugacy is an equivalence relation. For a = a7'aa, 
if b = x~'ax then a = (x~!)~'bx7!, and b = x~!ax,c = y~'by together imply 
c = (xy)7!axy. Consequently G may be partitioned into conjugacy classes, so that 
two elements of G are conjugate if and only if they belong to the same conjugacy class. 

For any element a of a group G, the set N, of all elements of G which commute 
with a, 


Na = {x € G: xa = ax}, 


is closed under multiplication and inversion. Thus N, is a subgroup of G, called the 
centralizer of ain G. 

If y and z lie in the same right coset of Na, so that z = xy for some x € Ng, then 
zy!a = azy—! and hence y~!ay = z~!az. Conversely, if y~!ay = z~!az, then y and 
z lie in the same right coset of Ng. If G is finite, it follows that the number of elements 
in the conjugacy class containing a is equal to the number of right cosets of the sub- 
group Ng, 1.e. to the index of the subgroup Nz, in G, and hence it divides the order of G. 

To conclude, we mention a simple way of creating new groups from given ones. 
Let G, G’ be groups and let G x G’ be the set of all ordered pairs (a, a’) with a € G and 
a’ € G’. Then G x G’ acquires the structure of a group if we define the product (a, a’)- 
(b, b’) of (a, a’) and (b, b’) to be (ab, a’b’). Multiplication is clearly associative, (e, e’) 
is an identity element and (a~!, a’~') is an inverse for (a, a’). The group thus con- 
structed is called the direct product of G and G’, and is again denoted by G x G’. 


8 Rings and Fields 


A nonempty set R is said to be a ring if two binary operations, + (addition) 
and - (multiplication), are defined with the properties 


(i) R is a commutative group under addition, with 0 (zero) as identity element and 
—a_as inverse of a; 
(11) multiplication is associative: (ab)c = a(bc) for alla, b,c € R; 
(iii) there exists an identity element | (one) for multiplication: al = a = 1a for every 
aeéR; 
(iv) addition and multiplication are connected by the two distributive laws: 


(a+b)c = (ac)+ (be), cla+b)=(ca)+ (ch) foralla,b,ce R. 


The elements 0 and | are necessarily uniquely determined. If, in addition, multi- 
plication is commutative: 


ab=ba foralla,be R, 


then R is said to be a commutative ring. In a commutative ring either one of the two 
distributive laws implies the other. 
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It may seem inconsistent to require that addition is commutative, but not multipli- 
cation. However, the commutative law for addition is actually a consequence of the 
other axioms for a ring. For, by the first distributive law we have 


(a+b)\14+1)=a(l4+1)+b041)=a4+a+b+b, 
and by the second distributive law 
(a+b)\1+l)=(@+b)l+(+b)l=at+b+ar+b. 
Since a ring is a group under addition, by comparing these two relations we obtain first 
ata+b=a+b+a 


and thena+b=b+a. 

As examples, the set Z of all integers is a commutative ring, with the usual defi- 
nitions of addition and multiplication, whereas if n > 1, the set M,(Z) of alln x n 
matrices with entries from Z is a noncommutative ring, with the usual definitions of 
matrix addition and multiplication. 

A very different example is the collection A(X) of all subsets of a given set X. If 
we define the sum A + B of two subsets A, B of X to be their symmetric difference, 
1.e. the set of all elements of X which are in either A or B, but not in both: 


A+ B=(AUB)\(AN B)= (AU B)N(AS UB), 
and the product AB to be the set of all elements of X which are in both A and B: 
AB=ANB, 


it is not difficult to verify that A(X) is a commutative ring, with the empty set 8 as 
identity element for addition and the whole set X as identity element for multiplication. 
For every A € A(X), we also have 


A+A=9, AA=A. 
The set operations are in turn determined by the ring operations: 
AUB=A+B+AB, ANB=AB, AS=A+4+X. 


A ring R is said to be a Boolean ring if aa = a for every a € R. It follows that 
a+a= O for every a é€ R, since 


ad+az=(a+a)(at+a)=a+a+a+ta. 
Moreover, a Boolean ring is commutative, since 
a+b=(a+b)(a+b)=a+b+ab+ba 


and ba = —ba, by what we have already proved. 
For an arbitrary set X, any nonempty subset of Y(X) which is closed under union, 
intersection and complementation can be given the structure of a Boolean ring in the 
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manner just described. It was proved by Stone (1936) that every Boolean ring may 
be obtained in this way. Thus the algebraic laws of set theory may be replaced by the 
more familiar laws of algebra and all such laws are consequences of a small number 
among them. 

We now return to arbitrary rings. In the same way as for Z, in any ring R we have 


a0 =0=0a_foreverya 
and 
(—a)b = —(ab) = a(—b) for alla, b. 


It follows that R contains only one element if 1 = 0. We will say that the ring R is 
‘trivial’ in this case. 

Suppose R is a nontrivial ring. Then, viewing R as a group under addition, the 
cyclic subgroup (1) is either infinite, and isomorphic to Z/OZ, or finite of order s, and 
isomorphic to Z/sZ for some positive integer s. The ring R is said to have character- 
istic O in the first case and characteristic s in the second case. 

For any positive integer 1, write 


na:=a-+---+a_ (nsummands). 
If R has characteristic s > 0, then sa = 0 for every a € R, since 
sa=(1+---+]la=0a=0. 


On the other hand, n1 ¥ O for every positive integer n < s, by the definition of 
characteristic. 

An element a of a nontrivial ring R is said to be ‘invertible’ or a unit if there exists 
an element a7! such that 


a a=l=aa_ 


The element a~! is then uniquely determined and is called the inverse of a. For 
example, | is a unit and is its own inverse. If a is a unit, then a! is also a unit and 
its inverse is a. If a and b are units, then ab is also a unit and its inverse is b~!a7!. It 
follows that the set R* of all units is a group under multiplication. 

A nontrivial ring R in which every nonzero element is invertible is said to be a 
division ring. Thus all nonzero elements of a division ring form a group under multipli- 
cation, the multiplicative group of the division ring. A field is a commutative division 
ring. 

A nontrivial commutative ring R is said to be an integral domain if it has no 
‘divisors of zero’, i.e. if a A 0 and b ¥ O imply ab ¥ O. A division ring also has 
no divisors of zero, since if a 4 0 and b ¥ 0, then a~!ab = b ¥ 0, and hence ab ¥ 0. 

As examples, the set Q of rational numbers, the set R of real numbers and the set 
C of complex numbers are all fields, with the usual definitions of addition and mul- 
tiplication. The set H of quaternions is a division ring, and the set Z of integers is an 
integral domain, but neither is a field. 
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In a ring with no divisors of zero, the additive order of any nonzero element a is 
the same as the additive order of 1, since ma = (ml)a = 0 if and only if m1 = 0. 
Furthermore, the characteristic of such a ring is either 0 or a prime number. For assume 
n = Im, where / and m are positive integers less than n. If n1 = 0, then 


(1)(m1) =n1 =0. 


Since there are no divisors of zero, either /1 = 0 or m1 = O, and hence the character- 
istic cannot be n. 

A subset S of a ring R is said to be a (two-sided) ideal if it is a subgroup of R 
under addition and if, for every a € S andc € R, bothac € Sandca eé S. 

Any ring R has two obvious ideals, namely R itself and the subset {0}. It is said to 
be simple if it has no other ideals and is nontrivial. 

Any division ring is simple. For if an ideal S of a division ring R contains a 4 0, 
then for every c € R we have c = (ca7!)a eS. 

Conversely, if a commutative ring R is simple, then it is a field. For, if a is any 
nonzero element of R, the set 


Sa = {xa: x € R} 


is an ideal (since R is commutative). Since S, contains la = a ~ 0, we must have 
Sq = R. Hence | = xa forsome x € R. Thus every nonzero element of R is invertible. 

If R is a commutative ring and aj,...,dm € R, then the set S consisting of all 
elements xja1 + ---+Xmdm, Where x; € R (1 < j < m), is clearly an ideal of R, the 
ideal generated by a, ..., dm. An ideal of this type is said to be finitely generated. 

We now show that if S is an ideal of the ring R, then the set .” of all cosets S + a 
of S can be given the structure of a ring. The ring R is a commutative group under 
addition. Hence, as we saw in §7, .Y acquires the structure of a (commutative) group 
under addition if we define the sum of S+a and $+) to be $+ (a+b). If x = s+a and 
x’ = s'+bforsome s,s’ € S, then xx’ = s” + ab, where s” = ss’ + as’ + sb. Since 
S is an ideal, s” € S. Thus without ambiguity we may define the product of the cosets 
S+aand S + b to be the coset S + ab. Evidently multiplication is associative, $ + 1 
is an identity element for multiplication and both distributive laws hold. The new ring 
thus constructed is called the quotient ring of R by the ideal S, and is denoted by R/S. 

A mapping f : R > R’ of aring R into aring R’ is said to be a (ring) homomor- 
phism if, for alla, b € R, 


f(a+b)= fa@+ fb), flab) = f@f), 


and if f(1) = 1’ is the identity element for multiplication in R’. 

The kernel of the homomorphism f is the set N of all a € R such that f(a) = 0’ 
is the identity element for addition in R’. The kernel is an ideal of R, since it is a 
subgroup under addition and since a € N,c € Rimply ac € N andca e€ N. 

For any a € R, puta’ = f(a) € R’. The coset N + a is the set of all x € R such 
that f(x) = a’, and the map N +a — a’ is a bijection from the collection of all cosets 
of N to f(R). Since f is a homomorphism, N + (a + b) is mapped to a’ + b’ and 
N +ab is mapped to a’b’. Hence the map N +a — a’ is also ahomomorphism of the 
quotient ring R/N into f(R). 
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A mapping f : R > R’ of aring R into a ring R’ is said to be a (ring) isomor- 
phism if it is both bijective and a homomorphism. The inverse mapping f—!: R’ > R 
is then also an isomorphism. (An automorphism of a ring R is an isomorphism of R 
with itself.) 

Thus we have shown that, if f : R > R’ is ahomomorphism of a ring R into a 
ring R’, with kernel N, then the quotient ring R/N is isomorphic to f(R). 

An ideal M of a ring R is said to be maximal if M # R and if there are no ideals 
SsuchthatM CSC R. 

Let M be an ideal of the ring R. If S is an ideal of R which contains M, then the 
set S’ of all cosets M + a witha € S is an ideal of R/M. Conversely, if S’ is an ideal 
of R/M, then the set S of alla € R such that M +a e€ S’ is an ideal of R which 
contains M. It follows that M is a maximal ideal of R if and only if R/M is simple. 
Hence an ideal M of a commutative ring R is maximal if and only if the quotient ring 
R/M is a field. 

To conclude, we mention a simple way of creating new rings from given ones. Let 
R, R’ be rings and let R x R’ be the set of all ordered pairs (a, a’) with a € R and 
a’ € R’. As we saw in the previous section, R x R’ acquires the structure of a (com- 
mutative) group under addition if we define the sum (a, a’) + (b, b’) of (a, a’) and 
(b, b') to be (a+b, a’ +b’). If we define their product (a, a’) - (b, b’) to be (ab, a’b’), 
then R x R’ becomes a ring, with (0, 0’) as identity element for addition and (1, 1’) as 
identity element for multiplication. The ring thus constructed is called the direct sum 
of R and R’, and is denoted by R @ R’. 
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Although we assume some knowledge of linear algebra, it may be useful to place the 
basic definitions and results in the context of the preceding sections. A set V is said 
to be a vector space over a division ring D if it is a commutative group under an 
operation + (addition) and there exists a map g : D x V > V (multiplication by a 
scalar) such that, if @(a, v) is denoted by av then, for alla, 6 € Dandallv, w e€ V, 


Gi) atv +w)=av+an, 
(ii) (a + B)v = av + fo, 
(iii) (af)v = a(Bo), 


(iv) lu =v, 


where | is the identity element for multiplication in D. The elements of V will be 
called vectors and the elements of D scalars. 

For example, for any positive integer n, the set D” of all n-tuples of elements of 
the division ring D is a vector space over D if addition and multiplication by a scalar 
are defined by 


(a1,-..,0n) + (B1,---, Bn) = (01 + fi, .--,4n + Bn), 


A(a1,...,4n) = (AQ,...,AGn). 


The special cases D = R and D = C have many applications. 
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As another example, the set @(/) of all continuous functions f : J — R, where 
I is an interval of the real line, is a vector space over the field R of real numbers if 
addition and multiplication by a scalar are defined, for every t € J, by 


(F+Hy9O=fOt+s, 
(af)(@) =af(t). 


Let V be an arbitrary vector space over a division ring D. If O is the identity 
element of V with respect to addition, then 


aO=O_ foreverya € D, 


since a0 = a(0 + O) = a0 + a0. Similarly, if 0 is the identity element of D with 
respect to addition, then 


Ovo =O foreveryv € V, 
since Ov = (0+ 0)v = Ov + Ov. Furthermore, 
(—a)v = —(av) foralla € Dandve V, 
since O = Ov = (a + (—a))v = av + (—a)v, and 
avZ#O ifaf#AO0andv ZO, 


since a~!(av) = (a~!a)v = lv =v. 

From now on we will denote the zero elements of D and V by the same symbol 0. 
This is easier on the eye and in practice is not confusing. 

A subset U of a vector space V is said to be a subspace of V if it is a vector space 
under the same operations as V itself. It is easily seen that a nonempty subset U is a 
subspace of V if (and only if) it is closed under addition and multiplication by a scalar. 
For then, if wu € U, also —u = (—1)u € U, and so U is an additive subgroup of V. 
The other requirements for a vector space are simply inherited from V. 

For example, if 1 < m <_ n, the set of all (a1,...,a@,) € D” witha; =--- = 
Gm = 0 is a subspace of D”. Also, the set oC ) of all continuously differentiable 
functions f : J > R is a subspace of @(/). Two obvious subspaces of any vector 
space V are V itself and the subset {0} which contains only the zero vector. 

If U; and U2 are subspaces of a vector space V, then their intersection U; M U2, 
which necessarily contains 0, is again a subspace of V. The sum U; + U2, consisting 
of all vectors uw; + u2 with u; € U; and u2 € U2, is also a subspace of V. Evidently 
U; + U2 contains U; and U2 and is contained in every subspace of V which contains 
both U,; and U2. If Uy) A U2 = {0}, the sum U + U2 is said to be direct, and is denoted 
by U; @ U2, since it may be identified with the set of all ordered pairs (u;, u2), where 
uy, € U; and uz € Ud. 

Let V be an arbitrary vector space over a division ring D and let {01,..., 0m} bea 
finite subset of V. A vector v in V is said to be a linear combination of v1, ..., Vm if 


D = Q1VI +++ + OamVm 
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for some @1,...,@m € D. The coefficients a1,..., Gm need not be uniquely deter- 
mined. Evidently a vector v is a linear combination of 01,...,0m if it is a linear 
combination of some proper subset, since we can add the remaining vectors with zero 
coefficients. 

If S is any nonempty subset of V, then the set (S) of all vectors in V which are 
linear combinations of finitely many elements of S is a subspace of V, the subspace 
‘spanned’ or generated by S. Clearly S € (S$) and (S) is contained in every subspace 
of V which contains S. 

A finite subset {01,..., 0m} of V is said to be linearly dependent (over D) if there 
exist @1,...,@m € D, not all zero, such that 


Q{v] +++: +Am0m = 0, 


and is said to be linearly independent otherwise. 
For example, in R? the vectors 


vo) = (1,0,1), v2 =(,1,0), 03 = UC, 1/2, 1/2) 
are linearly dependent, since v; + v2 — 203 = 0. On the other hand, the vectors 
e1 = (1,0,0), e2 =(0,1,0), e3 = (0,0, 1) 
are linearly independent, since a ,e; + a2e2 + a3e3 = (a1, a2, a3), and this is 0 only 


ifa, =a2 =a3 =0. 
In any vector space V, the set {v} containing the single vector v is linearly indepen- 


dent if o ~ 0 and linearly dependent if v = 0. If 01, ..., vm are linearly independent, 
then any vector v € (01,..., Um) has a unique representation as a linear combination 
of v1,..., Vm, Since if 


O10, +++ +Am0m = fir Sanat + Bnm, 


then 
(a1 — f1)v1 + +--+ (Gm — Bin)0m = 0 
and hence 
a, — fp =-++ =am — Bn = O. 
Evidently the vectors 01,..., 0m are linearly dependent if some proper subset is 


linearly dependent. Hence any nonempty subset of a linearly independent set is again 
linearly independent. 

A subset S' of a vector space V is said to be a basis for V if S is linearly indepen- 
dent and (S) = V. In the previous example, the vectors e1, e2, e3 are a basis for R’, 
since they are not only linearly independent but also generate R>. 

Any nontrivial finitely generated vector space has a basis. In fact if a vector space 
V is generated by a finite subset 7, then V has a basis B C T. Moreover, any linearly 
independent subset of V is also finite and its cardinality does not exceed that of T. It 
follows that any two bases contain the same number of elements. 
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If V has a basis containing n elements, we say V has dimension n and we write 
dim V = n. We say that V has infinite dimension if it is not finitely generated, and has 
dimension 0 if it contains only the vector 0. 

For example, the field C of complex numbers may be regarded as a 2-dimensional 
vector space over the field R of real numbers, with basis {1, i}. 

Again, D” has dimension n as a vector space over the division ring D, since it has 
the basis 


410,065 0) HL Djarey ee =O, 05.2241). 


On the other hand, the real vector space @ (J) of all continuous functions f : J > R 
has infinite dimension if the interval J contains more than one point since, for any 
positive integer n, the real polynomials of degree less than n form an n-dimensional 
subspace. 

The first of these examples is readily generalized. If E and F are fields with 
F C E, we can regard E as a vector space over F. If this vector space is finite- 
dimensional, we say that F is a finite extension of F and define the degree of E over 
F to be the dimension [E : F] of this vector space. 

Any subspace U of a finite-dimensional vector space V is again finite-dimensional. 
Moreover, dim U < dim V, with equality only if U = V. If U; and U2 are subspaces 
of V, then 


dim(U; + U2) + dim(U; M U2) = dim U; + dim U2. 


Let V and W be vector spaces over the same division ring D. Amap T : V — W 
is said to be linear, or a linear transformation, or a ‘vector space homomorphism’, if 
for all v, vo’ € V and every a € D, 


T(o+v')=Tv+To’, T(av) =a(To). 


Since the first condition implies that T is a homomorphism of the additive group of V 
into the additive group of W, it follows that 70 = 0 and T(—v) = —Tov. 

For example, if (z;,) is an m x n matrix with entries from the division ring D, then 
the map T : D” — D" defined by 


T(@1,...,Gm) = (fi,---5 Bn), 


where 
Be=O1tik t+e+>+Gmtme Usk <n), 


is linear. It is easily seen that every linear map of D” into D” may be obtained in this 
way. 

As another example, if @!(/) is the real vector space of all continuously differen- 
tiable functions f : J > R, then the map T : @'(1) > @(J) defined by Tf = f’ 
(the derivative of f) is linear. 

Let U, V, W be vector spaces over the same division ring D. If T : V — W and 
S : U — V are linear maps, then the composite map To S$ : U > W is again 
linear. For linear maps it is customary to write TS instead of T o S. The identity map 
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I: V — V defined by Jv = v for every v € V is clearly linear. If a linear map 
T : V > Wis bijective, then its inverse map T~! : W > V is again linear. 

If T : V — Wiis a linear map, then the set N of all v € V such that To = 0 is 
a subspace of V, called the nullspace or kernel of T. Since Tv = To’ if and only if 
T(v — v’) = 0, the map T is injective if and only if its kernel is {0}, i.e. when T is 
nonsingular. 

For any subspace U of V, its image TU = {Tv : v € U} is a subspace of W. In 
particular, TV is a subspace of W, called the range of T. Thus the map T is surjective 
if and only if its range is W. 

If V is finite-dimensional, then the range R of T is also finite-dimensional and 


dim R = dim V — dimN, 


(since R ~ V/N). The dimensions of R and WN are called respectively the rank and 
nullity of T. It follows that, if dim V = dim W, then T is injective if and only if it is 
surjective. 

Two vector spaces V, W over the same division ring D are said to be isomorphic 
if there exists a bijective linear map T : V — W. As an example, if V is an n- 
dimensional vector space over the division ring D, then V is isomorphic to D”. For if 
D1,-.-, Un is a basis for V and if v = ajo, +--+ + Gnvp is an arbitrary element of V, 
the map v > (q@1,..., Gy) is linear and bijective. 

Thus there is essentially only one vector space of given finite dimension over a 
given division ring. However, vector spaces do not always present themselves in the 
concrete form D”. An example is the set of solutions of a system of homogeneous 
linear equations with real coefficients. Hence, even if one is only interested in the 
finite-dimensional case, it is still desirable to be acquainted with the abstract definition 
of a vector space. 

Let V and W be vector spaces over the same division ring D. We can define the 
sum S + T of two linear maps S: V — W andT: V > W by 


(S+T)v = So+To. 


This is again a linear map, and it is easily seen that with this definition of addition 
the set of all linear maps of V into W is a commutative group. If D is a field, ie. if 
multiplication in D is commutative, then for any a € D the map aT defined by 


(aT )v = a(Tv) 


is again linear, and with these definitions of addition and multiplication by a scalar the 
set of all linear maps of V into W is a vector space over D. (If the division ring D is not 
a field, it is necessary to consider ‘right’ vector spaces over D, as well as ‘left’ ones.) 

If V = W, then the product TS is also defined and it is easily verified that the set 
of all linear maps of V into itself is a ring, with the identity map J as identity element 
for multiplication. The bijective linear maps of V to itself are the units of this ring and 
thus form a group under multiplication, the general linear group GL(V). 

Similarly to the direct product of two groups and the direct sum of two rings, one 
may define the tensor product V ® V' of two vector spaces V, V’ and the Kronecker 
product T ® T' of two linear maps T : V > W and T’: V’ > W’. 

The centre of a ring R is the set of all c € R such that ac = ca for every a € R. An 
associative algebra A over a field F is a ring containing F in its centre. On account of 
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the ring structure, we can regard A as a vector space over F’. The associative algebra 
is said to be finite-dimensional if it is finite-dimensional as a vector space over F. 

For example, the set M,(F) of all n x n matrices with entries from the field F 
is a finite-dimensional associative algebra, with the usual definitions of addition and 
multiplication, and with a € F identified with the matrix aT. 

More generally, if D is a division ring containing F in its centre, then the set 
M,,(D) of all x n matrices with entries from D is an associative algebra over F. It 
is finite-dimensional if D itself is finite dimensional over F’. 

By the definition for rings, an associative algebra A is simple if A ~ {0} and A 
has no ideals except {0} and A. It is not difficult to show that, for any division ring D 
containing F in its centre, the associative algebra M,,(D) is simple. It was proved by 
Wedderburn (1908) that any finite-dimensional simple associative algebra has the form 
M,,(D), where D is a division ring containing F in its centre and of finite dimension 
over F. 

If F = C, the fundamental theorem of algebra implies that C is the only such D. If 
F =R, there are three choices for D, by the following theorem of Frobenius (1878): 


Proposition 31 /f a division ring D contains the real field R in its centre and is of 
finite dimension as a vector space over R, then D is isomorphic to R, C or HH. 


Proof Suppose first that D is a field and D # R. If a € D\R then, since D is finite- 
dimensional over R, a is a root of a monic polynomial with real coefficients, which 
we may assume to be of minimal degree. Since a ¢ R, the degree is not 1 and the 
fundamental theorem of algebra implies that it must be 2. Thus 


a’ —2la+u=0 


for some A, w € R with 2? < yw. Then « — 1? = p? for some nonzero p € R and 
i = (a—A)/p satisfies i? = —1. Thus D contains the field R(i) = R+iR. But, since 
D is a field, the only x € D such that x7 = —1 are i and —i. Hence the preceding 
argument shows that actually D = R(). Thus D is isomorphic to the field C of 
complex numbers. 

Suppose now that D is not commutative. Let a be an element of D which is not 
in the centre of D, and let M be an R-subspace of D of maximal dimension which is 
commutative and which contains both a and the centre of D. If x € D commutes with 
every element of M, then x € M. Hence M is a maximal commutative subset of D. It 
follows that if x € M and x 4 0 then also x~! € M, since xy = yx forall y € M im- 
plies yx—! = x! for all y € M. Similarly x, x’ € M implies xx’ € M. Thus M isa 
field which properly contains R. Hence, by the first part of the proof, M is isomorphic 
to C. Thus M = R(i), where i? = —1, [M : R] = 2 and R is the centre of D. 

If x € D\M, then b = (x + ixi)/2 satisfies 


bi = (xi —ix)/2 = -—ib 40. 
Hence b € D\M and b*i = ib’. But, in the same way as before, VN = R + Rb is 


a maximal subfield of D containing b and R, and N = R(j), where j* = —1. Thus 
b* = a+b, where a, B € R. In fact, since b*i = ib*, we must have 8 = 0. Similarly 
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j =y +b, where y, 6 € Randé ¥ 0. Since j* = y? + 2ydb+0°a = —1, we must 
have y = 0. Thus j = 6b and ji = —ij. 
If we put k = ij, it now follows that 


R=-l, jk=i=—kj, ki = j= —ik. 


Since no R-linear combination of 1,i, 7 has these properties, the elements 1, i, j,k 
are IR-linearly independent. But, by Proposition 32 below, [D : M] = [M : R] = 2. 
Hence [D : R] = 4 and 1, i, j, k are a basis for D over R. Thus D is isomorphic to the 
division ring H of quaternions. 


To complete the proof of Proposition 31 we now prove 


Proposition 32 Let D be a division ring which, as a vector space over its centre C, has 
finite dimension [D : C]. If M is a maximal subfield of D, then [D:M]=[M: C]. 


Proof Putn = [D: C] and let e),...,e, be a basis for D as a vector space over C. 
Obviously we may suppose n > |. We show first that if a], ..., a, are elements of D 
such that 


ayxey +---+a,xen =O forevery x € D, 


then aj = --- = a, = O. Assume that there exists such a set {a}, ..., @,} with not all 
elements zero and choose one with the minimal number of nonzero elements. We may 
suppose the notation chosen so that a; # 0 fori < r anda; = Ofori > r and, by 
multiplying on the left by Cae we may further suppose that aj = 1. For any y € D 
we have 


ayyxey +++: +anyxe, = 0 = y(ayxey +--+ + anxen) 
and hence 
(ay = yay)xe, i (any = VAy)XEn = 0. 


Since ajy = ya; fori = 1 and fori > r, our choice of {a1,..., d,} implies that a;y = 
ya; for alli. Since this holds for every y € D, it follows that a; € C for all i. But this 
is a contradiction, since e],..., @, iS a basis for D over C and aye, +---+dy,en = 0. 

The map Tj, : D > D defined by Tjxx = ej xe, is a linear transformation of 
D as a vector space over C. By what we have just proved, the n7 linear maps Tix 
(j,k = 1,...,) are linearly independent over C. Consequently every linear transfor- 
mation of D as a vector space over C is a C-linear combination of the maps Tx. 

Suppose now that T : D — D is a linear transformation of D as a vector space 
over M. Since C C M, T is also a linear transformation of D as a vector space over C 
and hence has the form 


Tx =ayxe, +--+ +anxen 
for some aj,..., da, € D. But T(bx) = b(Tx) forall b € M and x e€ D. Hence 


(a,b — ba,)xe, +--+ + (anb — bay)xen =O for every x € D, 
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which implies ajb = ba; (i = 1,...,n). Since this holds for all b € M and M isa 
maximal subfield of D, it follows that a; ¢e M (i = 1,...,n). 

Let Y denote the set of all linear transformations of D as a vector space over M. 
By what we have already proved, every T € 7 is an M-linear combination of the 
maps 7|,..., T,, where Tjx = xe; (i = 1,...,n), and the maps 7),..., J, are lin- 
early independent over M. Consequently the dimension of 7 as a vector space over M 
is n. But 7 has dimension [D : M]* as a vector space over M. Hence [D : MP =n. 
Sincen =[D: M][M : C], it follows that[D:M]=[M: C]. 


10 Inner Product Spaces 


Let F denote either the real field IR or the complex field C. A vector space V over F is 
said to be an inner product space if there exists a map (u,v) > (u,v) of V x V into 
F such that for every a € F andallu,u’,v € V, 


( , 
(ii) (u +u', v) = (u,v) + (u’,d), 
(iil) (0, u) = (u,v), 
(iv) (u,u) > OifuAzO. 

If F = R, then (iii) simply says that (v, uv) = (u, v), since a real number is its own 
complex conjugate. The restriction u # O is necessary in (iv), since (i) and (iii) imply 
that 


(u,O) =(O,v) =0  forallu,v € V. 
It follows from (ii) and (iii) that 
(u,v +0’) = (u,v) + (u,v) forallu,v,v’ € V, 
and from (i) and (iii) that 
(u,av) =a(u,v) foreverya € F andallu,ov eV. 


The standard example of an inner product space is the vector space F”, with the 
inner product of x = (€|,...,€,) and y = (m1, ..., 4) defined by 


(x,y) = im +--- + Gam. 


Another example is the vector space @(/) of all continuous functions f : J > F, 
where J = [a,b] is a compact subinterval of IR, with the inner product of f and g 
defined by 


b 
fae / Feat. 


In an arbitrary inner product space V we define the norm ||v|| of a vector v € V by 


[lol] = (v, 0) '/. 
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Thus ||o|| > 0, with equality if and only if ov = O. Evidently 
|av|| = Ja||lo|| foralla e Fandve V. 
Inner products and norms are connected by Schwarz’s inequality: 
[(u,o)| < lu|llol| for allu,o € V, 


with equality if and only if u and vo are linearly dependent. For the proof we may sup- 
pose that wu and v are linearly independent, since it is easily seen that equality holds if 
u = Av orv = Au for some 4 € F. Then, for all a, 6 € F, not both 0, 


0 < (au + Bo, au + Bo) = |a\*(u,u) +aB(u,v) + @Bu, v) + [BP (0, d). 
If we choose a = (v, v) and f = —(u, v), this takes the form 
0 < |lul[*oll* — 2Jo|*|(u, 0)? + (du, 0) 7 oll? = {Ilel* oll? — [(w, 0) 7 }llo ll”. 
Hence 
(uo)? < lull? loll’, 


as we wished to show. We follow common practice by naming the inequality after 
Schwarz (1885), but (cf. §4) it had already been proved for R” by Cauchy (1821) and 
for @ (1) by Bunyakovskii (1859). 

It follows from Schwarz’s inequality that 


lu + ol]? = |lul? +2 u, v) + loll? 
< lull? + 2(w, o)| + loll? < {lll + loll}. 


Thus 
Ju + o|| < |u|] + llol| for allu,o € V, 


with strict inequality if u and v are linearly independent. 
It now follows that V acquires the structure of a metric space if we define the 
distance between u and v by 


d(u,v) = ||u — olf. 


In the case V = R” this is the Euclidean distance 
n 1/2 
d@x,y) = (> Ij - ni?) 
j=l 


and in the case V = @(J) it is the L?-norm 


b 1/2 
arey=(f Lf ~ eePar) . 
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The norm in any inner product space V satisfies the parallelogram law: 
\lu + oll? + lu — oll? = 2|lul|? + 2|lol]? for allu,o eV 


This may be immediately verified by substituting ||w||?_ = (w, w) throughout and 
using the linearity of the inner product. The geometrical interpretation is that in any 
parallelogram the sum of the squares of the lengths of the two diagonals is equal to the 
sum of the squares of the lengths of all four sides. 

It may be shown that any normed vector space which satisfies the parallelogram 
law can be given the structure of an inner product space by defining 


(u,v) = {lu +o|* — lu —ol?}/4 if F=R, 
= {lu + o]]? — ju — ol? + illu + iol]? — illu —io|)7}/4 if F =C. 


(Cf. the argument for F = Q in 84 of Chapter XIII.) 

In an arbitrary inner product space V a vector u is said to be ‘perpendicular’ or 
orthogonal to a vector v if (u,v) = 0. The relation is symmetric, since (u,v) = 0 
implies (v, uw) = 0. For orthogonal vectors u, v, the law of Pythagoras holds: 


2 2 2 
lu + oll” = fall” + Holl’. 


More generally, a subset E of V is said to be orthogonal if (u,v) = O for all 
u,v € E with u ¥ v. It is said to be orthonormal if, in addition, (u,u) = 1 for 
every u € E. An orthogonal set which does not contain O may be converted into an 
orthonormal set by replacing each u € E by u/||u||. 

For example, if V = F”, then the basis vectors 


w=00 2.0: BHO. Oy, B= OO. 


form an orthonormal set. It is easily verified also that, if 7 = [0, 1], then in @(/) the 
functions e,(t) = e27'™ (n € Z) form an orthonormal set. 

Let {e1,...,@m} be any orthonormal set in the inner product space V and let 
U be the vector subspace generated by ej,...,@m. Then the norm of a vector 
u= aye; +---+admem € U is given by 


2 2 2 
Jul“ = loro +--+ + lanl, 


which shows that e1,..., @m are linearly independent. 
To find the best approximation in U to a given vector v € V, put 


w= y1e1 +--+ + Ymem, 
where 
yj = (v, e;) (j =1,...,m). 


Then (w,e;) = (v,e;)(@ = 1,...,m) and hence (v — w, w) = 0. Consequently, by 
the law of Pythagoras, 


2 2 Zs 
llv||" = lo — wIlr + [wll 
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Since ||w||? = |y1|? +---+lyml?, this yields Bessel’s inequality: 
I(v, ex)? +++ + (0, em)? < |loll?, 


with strict inequality if o ¢ U. For any u € U, we also have (v — w, w — u) = 0 and 
so, by Pythagoras again, 
2 2 2 
lo — ull” = lo — wll" + Iw — ull’. 
This shows that w is the unique nearest point of U to v. 

From any linearly independent set of vectors 01,...,0m We can inductively 
construct an orthonormal set e),..., @m such that e;,..., ex span the same vector sub- 
space as 01,..., 0g for 1 < k < m. We begin by taking e; = v1 /||v1 ||. Now suppose 
é1,..., ex have been determined. If 


W = VEL — (Dk41, C1)E1 — + — (OK41, CK) EK, 


then (w,e;) =O (j =1,...,k). Moreover w ¥ O, since w is a linear combination of 
D1,-.., 0k41 in which the coefficient of ox41 is 1. By taking eg4) = w/||w||, we obtain 
an orthonormal set e1,..., ex41 spanning the same linear subspace as 01,..., Dx41. 
This construction is known as Schmidt’s orthogonalization process, because of its use 
by E. Schmidt (1907) in his treatment of linear integral equations. The (normalized) 
Legendre polynomials are obtained by applying the process to the linearly independent 
functions 1, ¢, t?,...in the space @(/), where J = [—1, 1]. 

It follows that any finite-dimensional inner product space V has an orthonormal 
basis €1,..., @, and that 


n 
\lo ||? = ye Iv, es)? for every v € V. 
j=l 
In an infinite-dimensional inner product space V an orthonormal set E may even 
be uncountably infinite. However, for a given v € V, there are at most countably many 
vectors e € E for which (v, e) 4 0. For if {e1,..., em} is any finite subset of E then, 
by Bessel’s inequality, 


m 
2 2 
Dl, e,)? < loll 


j=l 


and so, for each n e€ N, there are at most n2 — 1 vectors e € E for which 
I(v, €)| > |lol|/n. 

If the vector subspace U of all finite linear combinations of elements of EF is dense 
in V then, by the best approximation property of finite orthonormal sets, Parseval’s 
equality holds: 


a, I(v, e) |? = Ilo || foreveryv € V. 
ecE 


Parseval’s equality holds for the inner product space @(/), where J = [0, 1], and 
the orthonormal set E = {e?7'"' : n € Z} since, by Weierstrass’s approximation the- 
orem (see the references in §6 of Chapter XI), every f € @(/) is the uniform limit of 


a sequence of trigonometric polynomials. The result in this case was formally derived 
by Parseval (1805). 
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An almost periodic function, in the sense of Bohr (1925), is a function f : R-— C 
which can be uniformly approximated on R by generalized trigonometric polynomials 


m 
id jt 
= cjerl," 
Gel 


where cj € Candd; ¢ R(j = 1,...,m). For any almost periodic functions f, g, the 
limit 


T — 
(fa) = lim 27) [feed 


exists. The set Z of all almost periodic functions acquires in this way the structure of 
an inner product space. The set E = {e!*! : 2 € R} is an uncountable orthonormal set 
and Parseval’s equality holds for this set. 

A finite-dimensional inner product space is necessarily complete as a metric space, 
ie., every fundamental sequence converges. However, an infinite-dimensional inner 
product space need not be complete, as @(/) already illustrates. An inner product 
space which is complete is said to be a Hilbert space. 

The case considered by Hilbert (1906) was the vector space £7 of all infinite 
sequences x = (€, &,...) of complex numbers such that >", , I&|? < 00, with 


y= > Si. 


k>1 


Another example is the vector space L?(1 ), where J = [0, 1], of all (equivalence 


classes of) Lebesgue measurable functions f : J — C such that ie | f(t)*dt < 00, 
with 


1 
xe [ Feat. 


With any f € L?(/) we can associate a sequence a e ¢*, consisting of the in- 
ner products (f, én), where e,(t) = e?7'""(n € Z), in some fixed order. The map 
F + L?(1) > © thus defined is linear and, by Parseval’s equality, 


l-AF ll = WF. 


In fact -F is an isometry since, by the theorem of Riesz—Fischer (1907), it is bijective. 


11 Further Remarks 


A vast fund of information about numbers in different cultures is contained in 
Menninger [52]. A good popular book is Dantzig [18]. 

The algebra of sets was created by Boole (1847), who used the symbols + and - 
instead of U and /, as is now customary. His ideas were further developed, with appli- 
cations to logic and probability theory, in Boole [10]. A simple system of axioms for 
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Boolean algebra was given by Huntingdon [39]. For an introduction to Stone’s repre- 
sentation theorem, referred to in §8, see Stone [69]; there are proofs in Halmos [30] and 
Sikorski [66]. For applications of Boolean algebras to switching circuits see, for ex- 
ample, Rudeanu [62]. Boolean algebra is studied in the more general context of lattice 
theory in Birkhoff [6]. 

Dedekind’s axioms for N may be found on p. 67 of [19], which contains also his 
earlier construction of the real numbers from the rationals by means of cuts. Some 
interesting comments on the axioms (N1)-(N3) are contained in Henkin [34]. Start- 
ing from these axioms, Landau [47] gives a detailed derivation of the basic properties 
of N, Q, R and C. 

The argument used to extend N to Z shows that any commutative semigroup sat- 
isfying the cancellation law may be embedded in a commutative group. The argument 
used to extend Z to Q shows that any commutative ring without divisors of zero may 
be embedded in a field. 

An example of an ordered field which does not have the Archimedean prop- 
erty, although every fundamental sequence is (trivially) convergent, is the field «R of 
hyperreal numbers, constructed by Abraham Robinson (1961). Hyperreal numbers are 
studied in Stroyan and Luxemburg [70]. 

The ‘arithmetization of analysis’ had a gradual evolution, which is traced in 
Chapitre VI (by Dugac) of Dieudonné et al. [22]. A modern text on real analysis 
is Rudin [63]. In Lemma 7 of Chapter VI we will show that all norms on R” are 
equivalent. 

The contraction principle (Proposition 26) has been used to prove the central 
limit theorem of probability theory by Hamedani and Walter [32]. Bessaga (1959) has 
proved a converse of the contraction principle: Let E be an arbitrary set, f: E> Ea 
map of £ to itself and @ a real number such that 0 < @ < 1. If each iterate f”(n € N) 
has at most one fixed point and if some iterate has a fixed point, then a complete metric 
d can be defined on E such that d( f(x’), f(x”)) < @d(x’, x”) for all x’,x" € ELA 
short proof is given by Jachymski [40]. 

There are other important fixed point theorems besides Proposition 26. Brouwer’s 
fixed point theorem states that, if B = {x € R” : |x| < 1} is the n-dimensional closed 
unit ball, every continuous map f : B —> B has a fixed point. For an elementary 
proof, see Kulpa [44]. The Lefschetz fixed point theorem requires a knowledge of al- 
gebraic topology, even for its statement. Fixed point theorems are extensively treated 
in Dugundji and Granas [23] (and in A. Granas and J. Dugundji, Fixed Point Theory, 
Springer-Verlag, New York, 2003). 

For a more detailed discussion of differentiability for functions of several variables 
see, for example, Fleming [26] and Dieudonné [21]. The inverse function theorem 
(Proposition 27) is a local result. Some global results are given by Atkinson [5] and 
Chichilnisky [14]. For a holomorphic version of Proposition 28 and for the simple way 
in which higher-order equations may be replaced by systems of first-order equations 
see, e.g., Coddington and Levinson [16]. 

The formula for the roots of a cubic was first published by Cardano [12], but it 
was discovered by del Ferro and again by Tartaglia, who accused Cardano of breaking 
a pledge of secrecy. Cardano is judged less harshly by historians today than previ- 
ously. His book, which contained developments of his own and also the formula for 
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the roots of a quartic discovered by his pupil Ferrari, was the most significant Western 
contribution to mathematics for more than a thousand years. 

Proposition 29 still holds, but is more difficult to prove, if in its statement “has a 
nonzero derivative” is replaced by “which is not constant”. Read [57] shows that the 
basic results of complex analysis may be deduced from this stronger form of Proposi- 
tion 29 without the use of complex integration. 

A field F is said to be algebraically closed if every polynomial of positive degree 
with coefficients from F has a root in F. Thus the ‘fundamental theorem of algebra’ 
says that the field C of complex numbers is algebraically closed. The proofs of this 
theorem due to Argand—Cauchy and Euler—Lagrange—Laplace are given in Chapter 4 
(by Remmert) of Ebbinghaus et al. [24]. As shown on p. 77 of [24], the latter method 
provides, in particular, a simple proof for the existence of n-th roots. 

Wall [72] gives a proof of the fundamental theorem of algebra, based on the notion 
of topological degree, and Ahlfors [1] gives the most common complex analysis proof, 
based on Liouville’s theorem that a function holomorphic in the whole complex plane 
is bounded only if it is a constant. A form of Liouville’s theorem is easily deduced 
from Proposition 29: if the power series 


p(z) =a9 +aiz+anz? +--- 


converges and | p(z)| is bounded for all z € C, then a, = 0 for every n > 1. 

The representation of trigonometric functions by complex exponentials appears in 
§138 of Euler [25]. The various algebraic formulas involving trigonometric functions, 
such as 


cos 3x = 4cos* x — 3cosx, 


are easily established by means of this representation and the addition theorem for the 
exponential function. 

Some texts on complex analysis are Ahlfors [1], Caratheodory [11] and 
Narasimhan [56]. 

The 19th century literature on quaternions is surveyed in Rothe [59]. Although 
Hamilton hoped that quaternions would prove as useful as complex numbers, a quater- 
nionic analysis analogous to complex analysis was first developed by Fueter (1935). A 
good account is given by Sudbery [71]. 

One significant contribution of quaternions was indirect. After Hamilton had 
shown the way, other ‘hypercomplex’ number systems were constructed, which led 
eventually to the structure theory of associative algebras discussed below. 

It is not difficult to show that any automorphism of HH, i.e. any bijective map 
T :H > Hsuch that 


T(ixty)=Tx+Ty, T(xy)=(Tx)(Ty) forallx, y € H, 


has the form 7x = uxu7! for some quaternion u with norm 1. 

For octonions and their uses, see van der Blij [8] and Springer and Veldkamp [67]. 
The group of all automorphisms of the algebra O is the exceptional simple Lie group 
G2. The other four exceptional simple Lie groups are also all related to O in some way. 
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Of wider significance are the associative algebras introduced in 1878 by 
Clifford [15] (pp. 266-276) as a common generalization of quaternions and Grassmann 
algebra. Clifford algebras were used by Lipschitz (1886) to represent orthogonal trans- 
formations in n-dimensional space. There is an extensive discussion of Clifford alge- 
bras in Deheuvels [20]. For their applications in physics, see Salingaros and Wene [64]. 

Proposition 32 has many uses. The proof given here is extracted from Nagahara 
and Tominaga [55]. 

It was proved by both Kervaire (1958) and Milnor (1958) that if a division 
algebra A (not necessarily associative) contains the real field R in its centre and is of 
finite dimension as a vector space over R, then this dimension must be 1,2,4 or 8 
(but the algebra need not be isomorphic to R, C, H or O). All known proofs use deep 
results from algebraic topology, which was first applied to the problem by H. Hopf 
(1940). For more information about the proof, see Chapter 11 (by Hirzebruch) of 
Ebbinghaus ef al. [24]. 

When is the product of two sums of squares again a sum of squares? To make 
the question precise, call a triple (7,5, f) of positive integers ‘admissible’ if there 
exist real numbers pjjx(1 < i < t,1 < j < r,1 < k < s) such that, for every 
x = (€,...,¢-) € R’ andevery y = (m1,..., 45) € R®, 


Gy eee Gy ety Sere obey, 


where 


Gi = > — PijkSj Nk- 


j=lk=1 


The question then becomes, which triples (r,s, tf) are admissible? It is obvious 
that (1, 1, 1) is admissible and the relation n(x)n(y) = n(xy) for the norms of com- 
plex numbers, quaternions and octonions shows that (f, t,t) is admissible also for 
t = 2,4, 8. It was proved by Hurwitz (1898) that (f, t,t) is admissible for no other 
values of t. A survey of the general problem is given by Shapiro [65]. 

General introductions to algebra are provided by Birkhoff and MacLane [7] and 
Herstein [35]. More extended treatments are given in Jacobson [41] and Lang [48]. 

The theory of groups is treated in M. Hall [29] and Rotman [60]. An especially 
significant class of groups is studied in Humphreys [38]. 

If H is a subgroup of a finite group G, then it is possible to choose a system of left 
coset representatives of H which is also a system of right coset representatives. This 
interesting, but not very useful, fact belongs to combinatorics rather than to group the- 
ory. We mention it because it was the motivation for the theorem of P. Hall (1935) on 
systems of distinct representatives, also known as the ‘marriage theorem’. Further de- 
velopments are described in Mirsky [53]. For quantitative versions, with applications 
to operations research, see Ford and Fulkerson [27]. 

The theory of rings separates into two parts. Noncommutative ring theory, which 
now incorporates the structure theory of associative algebras, is studied in the books 
of Herstein [36], Kasch [42] and Lam [46]. Commutative ring theory, which grew 
out of algebraic number theory and algebraic geometry, is studied in Atiyah and 
Macdonald [4] and Kunz [45]. 
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Field theory was established as an independent subject of study in 1910 by 
Steinitz [68]. The books of Jacobson [41] and Lang [48] treat also the more recent 
theory of ordered fields, due to Artin and Schreier (1927). 

Fields and groups are connected with one another by Galois theory. This subject 
has its origin in attempts to solve polynomial equations “by radicals’. The founder of 
the subject is really Lagrange (1770/1). By developing his ideas, Ruffini (1799) and 
Abel (1826) showed that polynomial equations of degree greater than 4 cannot, in gen- 
eral, be solved by radicals. Abel (1829) later showed that polynomial equations can be 
solved by radicals if their “Galois group’ is commutative. In honour of this result, 
commutative groups are often called abelian. 

Galois (1831, published posthumously in 1846) introduced the concept of normal 
subgroup and stated a necessary and sufficient condition for a polynomial equation to 
be solvable by radicals. The significance of Galois theory today lies not in this result, 
despite its historical importance, but in the much broader “fundamental theorem of 
Galois theory’. In the form given it by Dedekind (1894) and Artin (1944), this estab- 
lishes a correspondence between extension fields and groups of automorphisms, and 
provides a framework for the solution of a number of algebraic problems. 

Morandi [54] and Rotman [61] give modern accounts of Galois theory. The histor- 
ical development is traced in Kiernan [43]. In recent years attention has focussed on 
the problem of determining which finite groups occur as Galois groups over a given 
field; for an introductory account, see Matzat [51]. 

Some texts on linear algebra and matrix theory are Halmos [31], Horn and 
Johnson [37], Mal’cev [50] and Gantmacher [28]. 

The older literature on associative algebras is surveyed in Cartan [13]. The texts on 
noncommutative rings cited above give modern introductions. 

A vast number of characterizations of inner product spaces, in addition to the par- 
allelogram law, is given in Amir [3]. The theory of Hilbert space is treated in the books 
of Riesz and Sz.-Nagy [58] and Akhiezer and Glazman [2]. For its roots in the theory 
of integral equations, see Hellinger and Toeplitz [33]. Almost periodic functions are 
discussed from different points of view in Bohr [9], Corduneanu [17] and Maak [49]. 
The convergence of Fourier series is treated in Zygmund [73], for example. 
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I 


Divisibility 


1 Greatest Common Divisors 


In the set N of all positive integers we can perform two basic operations: addition and 
multiplication. In this chapter we will be primarily concerned with the second opera- 
tion. 

Multiplication has the following properties: 


(M1) if ab = ac, thenb=c; (cancellation law) 
(M2) ab = ba for alla, b; (commutative law) 
(M3) (ab)c = a(bc) for alla,b,c; (associative law) 

(M4) la =a foralla. (identity element) 


For any a,b € N we say that b divides a, or that b is a factor of a, or that a is a 
multiple of b if a = ba’ for some a’ € N. We write bla if b divides a and b{a if b does 
not divide a. For example, 2|6, since 6 = 2 x 3, but 4/6. (We sometimes use x instead 
of - for the product of positive integers.) The following properties of divisibility follow 
at once from the definition: 


(i) ala and \|a for every a; 

(ii) if bla and c\b, then cla; 
(iii) if bla, then blac for every c; 
(iv) belac if and only if b\a; 

(v) if bla and a\b, thenb =a. 


For any a, b € N we say that d is a common divisor of a and b if d|a and d|b. We 
say that a common divisor d of a and b is a greatest common divisor if every com- 
mon divisor of a and b divides d. The greatest common divisor of a and b is uniquely 
determined, if it exists, and will be denoted by (a, b). 

The greatest common divisor of a and b is indeed the numerically greatest com- 
mon divisor. However, it is preferable not to define greatest common divisors in this 
way, since the concept is then available for algebraic structures in which there is no 
relation of magnitude and only the operation of multiplication is defined. 
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Proposition 1 Any a,b € N have a greatest common divisor (a, b). 


Proof Without loss of generality we may suppose a > Db. If b divides a, then 
(a, b) = b. Assume that there exists a pair a, b without greatest common divisor and 
choose one for which a is a minimum. Then | < b < a, since b does not divide a. 
Since also 1 < a—b < a, the pair a — b, b has a greatest common divisor d. Since 
any common divisor of a and b divides a — b, and since d divides (a — b) + b =a, it 


follows that d is a greatest common divisor of a and b. But this is a contradiction. 


The proof of Proposition 1 uses not only the multiplicative structure of the set N, 
but also its ordering and additive structure. To see that there is a reason for this, con- 
sider the set S of all positive integers of the form 4k + 1. The set S is closed under 
multiplication, since 


(47+ 4k+1) =44jkt+ jth +1, 


and we can define divisibility and greatest common divisors in S by simply replacing 
N by S in our previous definitions. However, although the elements 693 and 189 of $ 
have the common divisors 9 and 21, they have no greatest common divisor according 
to this definition. 

In the following discussion we use the result of Proposition 1, but make no further 
appeal to either addition or order. 

For any a,b € N we say that h is a common multiple of a and b if alh and bih. 
We say that a common multiple / of a and b is a least common multiple if h divides 
every common multiple of a and b. The least common multiple of a and b is uniquely 
determined, if it exists, and will be denoted by [a, b]. 

It is evident that, for every a, 


(a,l)=1, [a,1l)=a, 
(a,a) =a=[a, a]. 
Proposition 2 Any a,b € N have a least common multiple [a, b]. Moreover, 
(a, b)[a, b] = ab. 
Furthermore, for alla, b,c € N, 
(ac, bc) = (a, b)c, [ac, bc] = [a, blc, 
(La, b], la, cl) =[a, 1], [@, 2), (a, c)] = G, [b, ce). 


Proof We show first that (ac, bc) = (a, b)c. Put d = (a, b). Clearly cd is a common 
divisor of ac and bc, and so (ac, bc) = qcd for some q € N. Thus ac = qcda’, 
bc = qcdb’ for some a’, b’ € N. It follows that a = qda’, b = qdb’. Thus gd is a 
common divisor of a and b. Hence qd divides d, which implies g = 1. 

If g is any common multiple of a and b, then ab divides ga and gb, and hence ab 
also divides (ga, gb). But, by what we have just proved, 


(ga, gb) = (a, b)g = dg. 


Hence h := ab/d divides g. Since h is clearly acommon multiple of a and b, it follows 
that h = [a, b]. Replacing a, b by ac, bc, we now obtain 


[ac, bc] = acbc/(ac, bc) = abc/(a, b) = he. 
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If we put 
A= (la, 5], [a,c]), B=[a, (,c)], 
then by what we have already proved, 


A = (ab/(a, b), ac/(a, c)), 
B=a(b,c)/(a, (b, c)) = (ab/(a, (b, c)), ac/(a, (b, c))). 


Since any common divisor of ab/(a, b) and ac/(a,c) is also a common divisor of 
ab/(a, (b, c)) and ac/(a, (b, c)), it follows that A divides B. On the other hand, a 
divides A, since a divides [a, b] and [a, c], and similarly (b, c) divides A. Hence B 
divides A. Thus B = A. 

The remaining statement of the proposition is proved in the same way, with greatest 
common divisors and least common multiples interchanged. 


The last two statements of Proposition 2 are referred to as the distributive laws, 
since if the greatest common divisor and least common multiple of a and b are 
denoted by a A b and a V b respectively, they take the form 


(av b)A(aVc)=aVvibaAc), (aaAb)v (adAc)=anr(bVe). 


Properties (i), (ii) and (v) at the beginning of the section say that divisibility is a 
partial ordering of the set N with 1 as least element. The existence of greatest common 
divisors and least common multiples says that N is a /attice with respect to this partial 
ordering. The distributive laws say that N is actually a distributive lattice. 

We say that a,b € N are relatively prime, or coprime, if (a, b) = 1. Divisibility 
properties in this case are much simpler: 


Proposition 3 For any a,b,c € N with (a,b) = 1, 


(i) ifa|c and b\c, then ab|c; 
(ii) ifalbc, then a|c; 
(ili) (a, bc) = (a,c); 
(iv) ifalso (a, c) = 1, then (a, bc) = 1; 
(v) (a, b") = 1 forallm,n > 1. 
Proof To prove (i), note that [a, b] divides c and [a, b] = ab. To prove (ii), note that 


a divides (ac, bc) = (a, b)c = c. To prove (iii), note that any common divisor of a 
and bc divides c, by (ii). Obviously (iii) implies (iv), and (v) follows by induction. 


Proposition 4 [f. a,b € N and (a,b) = 1, then any divisor of ab can be uniquely 
expressed in the form de, where d\a and e|b. Conversely, any product of this form is a 
divisor of ab. 


Proof The proof is based on Proposition 3. Suppose c divides ab and put d = (a,c), 
e = (b,c). Then (d, e) = 1 and hence de divides c. If a = da’ and c = dc’, then 
(a’,c’) = 1 and e|c’. On the other hand, c’|a’b and hence c’|b. Since e = (b,c), it 
follows that c’ = e andc = de. 
Suppose de = d’e’, where d,d’ divide a and e, e’ divide b. Then d|d’, since 
(d, e’) = 1, and similarly d’|d, since (d’, e) = 1. Hence d’ = d and e’ =e. 
The final statement of the proposition is obvious. 
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It follows from Proposition 4 that if c” = ab, where (a, b) = 1, then a = d” and 
b =e" for some d,e EN. 

The greatest common divisor and least common multiple of any finite set of ele- 
ments of N may be defined in the same way as for sets of two elements. By induction 
we easily obtain: 


Proposition 5 Any aj,..., da, € N have a greatest common divisor (a), ..., An) and 
a least common multiple [a,, ..., an]. Moreover, 


(i) (a1, a2, SS “8: Gn) — (a1, (a2, Cane | Gn)); [a1, a2, aa | an | = [a1, [a2, Ee} an]; 
(11) (a\c, Be Bg) anc) = (a1, rae | an)C, [aic, a eae anc] — [a1, SOLE an|c; 
(iti) (a1,...,4n) = a/[a/ay,...,a/an], [a1,..-,4n] = a/(a/aj,...,a/an), where 
ad=a,---Qy. 
We can use the distributive laws to show that 
([a, 5], [a,c], [b, cl) = L(a, b), (a,c), (, )]. 


In fact the left side is equal to {a V (b Ac)} A (b Vc), whereas the right side is equal to 


(bAc)V {an (bv c)} = {(bAc) Vasa {(bAc) Vv (bv c)} 
={av(bAc)}A (Vc). 


If 
a=(q,...,d4m), b=(b,...,bn), 
then ab is the greatest common divisor of all products a;bx, since (ajb),...,4jbn) = 
ajb and (a,b, ..., amb) = ab. 
Similarly, if 
a=[a\,..., aml, b=[b,..., bn], 


then ab is the least common multiple of all products a; bx. 
It is easily shown by induction that if (a;,a;) = 1 for 1 <i < j <m, then 


(a, +++ Gm, c) = (41,.€) +++ (Gn, c), [a1 +++ dm, ¢] = [a1,.-., dm, €). 
Proposition 6 [fa € N has two factorizations 
a= d-++ bm = C1+++ Cn, 


then these factorizations have a common refinement, i.e. there existdjx ENA <j < 
m,1<k <n) such that 


n m 
by =| 4x. =| [aie 
k=1 j=l 
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Proof We show first that if aq = a,---d, and dja, then d = d,---d,y, where dj|a; 
(1 < i < n). We may suppose that n > 1 and that the assertion holds for prod- 


ucts of less than n elements of N. Put a’ = a,---dy,—, and d’ = (a’,d). Then 
d' = d,-+-dy-1, where dj|aj(1 < i < n). Moreover a” = a'/d’ and d” = d/d' 
are coprime. Since d” = d/d’ divides a”a, = a/d', the greatest common divisor 


an = (ana”, and”) is divisible by d”. Thus we can take d, = d”. 
We return now to the proposition. Since cj| UF bj, we can write cy = I]; dj, 
where dj;|b;. Put bi = b;/dj,. Then 


/ 
[14 = a/c, = C2-°+Cp. 
J 


Hence we can write cz = [| r dj2, where dj2|b’.. Proceeding in this way, we obtain 
factorizations cy = I; d jx such that [], djx divides b;. In fact, since 


[ [ax =a =|[4. 
isk J 


we must have b; = [|]; djx. 


Instead of defining divisibility and greatest common divisors in the set N of all 
positive integers, we can define them in the set Z of all integers by simply replacing 
N by Z in the previous definitions. The properties (1)—-(v) at the beginning of this sec- 
tion continue to hold, provided that in (iv) we require c 4 0 and in (v) we alter the 
conclusion to b = +a. We now list some additional properties: 


(i)’ alO for every a; 
(ii)’ if Ola, then a = 0; 
(iii)’ if cla and c\b, then c\ax + by for all x, y. 


Greatest common divisors and least common multiples still exist, but uniqueness 
holds only up to sign. With this understanding, Propositions 2—4 continue to hold, and 
so also do Propositions 5 and 6 if we require a ¥ 0. It is evident that, for every a, 


(a,0)=a, [a,0]=0. 


More generally, we can define divisibility in any integral domain, i.e. a commuta- 
tive ring in which a ¥ 0 and b F O together imply ab ¥ 0. The properties (i)—(v) at 
the beginning of the section continue to hold, provided that in (iv) we require c 4 0 
and in (v) we alter the conclusion to b = ua, where u is a unit, i.e. u|1. The properties 
(i)/—-(iii)’ above also remain valid. 

We define a GCD domain to be an integral domain in which any pair of elements 
has a greatest common divisor. This implies that any pair of elements also has a least 
common multiple. Uniqueness now holds only up to unit multiples. With this under- 
standing Propositions 2—6 continue to hold in any GCD domain in the same way as 
for Z. 

An important example, which we will consider in Section 3, of a GCD domain 
other than Z is the polynomial ring K [t], consisting of all polynomials in ¢ with coef- 
ficients from an arbitrary field K. The units in this case are the nonzero elements of K. 
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Another example, which we will meet in §4 of Chapter VI, is the valuation ring R 
of a non-archimedean valued field. In this case, for any a, b € R, either a|b or b\|a and 
so (a, b) is either a or b. 

In the same way that the ring Z of integers may be embedded in the field Q of 
rational numbers, any integral domain R may be embedded in a field K, its field of 
fractions, so that any nonzero c € K has the formc = ab7!, where a,b € R and 
b # 0. If R is a GCD domain we can further require (a,b) = 1, and a, b are then 
uniquely determined apart from a common unit multiple. The field of fractions of the 
polynomial ring K [f] is the field K (t) of rational functions. 

In our discussion of divisibility so far we have avoided all mention of prime num- 
bers. A positive integer a # | is said to be prime if | and a are its only positive 
divisors, and otherwise is said to be composite. 

For example, 2, 3 and 5 are primes, but 4 = 2 x 2 and 6 = 2 x 3 are composite. 
The significance of the primes is that, as far as multiplication is concerned, they are 
the ‘atoms’ and the composite integers are the ‘molecules’. This is made precise in the 
following so-called fundamental theorem of arithmetic: 


Proposition7 [f a € N anda # 1, then a can be represented as a product of 
finitely many primes. Moreover, the representation is unique, except for the order of 
the factors. 


Proof Assume, on the contrary, that some composite a; € N is not a product of 
finitely many primes. Since a; is composite, it has a factorization aj = a2b2, where 
a2, bz € N and az, bz $ 1. At least one of a2, b2 must be composite and not a product 
of finitely many primes, and we may choose the notation so that a2 has these proper- 
ties. The preceding argument can now be repeated with az in place of a;. Proceeding in 
this way, we obtain an infinite sequence (a,) of positive integers such that ag41 divides 
ay and ag41 4 ax for each k > 1. But then the sequence (a;) has no least element, 
which contradicts Proposition I.3. 
Suppose now that 


a> Pi-:::Pm=441°°"dn 


are two representations of a as a product of primes. Then, by Proposition 6, there exist 
dje ENO < j <m,1 <k <n) such that 


n m 
pi=[[die. ae =|] ait. 
k=1 


j=l 


Since p, is a prime, we must have dix, = pi for some ky ¢€ {1,..., nm}, and since gx, 
is a prime, we must have gx, = dix, = p1. The same argument can now be applied to 


a’ =[[oj=[] x. 


j#l k Pky 


It follows that m = n and q1,..., Gn iS a permutation of p1,..., Pm- 
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It should be noted that factorization into primes would not be unique if we admit- 
ted 1 as a prime. The fundamental theorem of arithmetic may be reformulated in the 
following way: any a € N can be uniquely represented in the form 


Pp 


where p runs through the primes and the a» are non-negative integers, only finitely 
many of which are nonzero. It is easily seen that if b € N has the analogous represen- 


tation 
b= || p*, 
Dp 


then bja if and only if £,) < ap for all p. It follows that the greatest common divisor 
and least common multiple of a and b have the representations 


(a,b)=[[ pv”. (a,b1=[[ >’. 
P P 


where 


Yp = min{ap, Bp}, Op = max{ap, Bp}. 


The fundamental theorem of arithmetic extends at once from N to Q: any nonzero 
rational number a can be uniquely represented in the form 


a= ul] p”. 
P 


where u = +1 is a unit, p runs through the primes and the ap are integers (not neces- 
sarily non-negative), only finitely many of which are nonzero. 

The following property of primes was already established in Euclid’s Elements 
(Book VII, Proposition 30): 


Proposition 8 [f p is a prime and p|bc, then p\|b or p\c. 


Proof If p does not divide b, we must have (p,b) = 1. But then p divides c, by 
Proposition 3(ii). 


The property in Proposition 8 actually characterizes primes. For if a is composite, 
then a = bc, where b, c £ 1. Thus a|bc, but a{b and atc. 

We consider finally the extension of these notions to an arbitrary integral domain R. 
For any nonzero a,b € R, we say that a divisor b of a is a proper divisor if a does 
not divide b (i.e., if a and b do not differ only by a unit factor). We say that p € R is 
irreducible if p is neither zero nor a unit and if every proper divisor of p is a unit. We 
say that p € R is prime if p is neither zero nor a unit and if p|bc implies p|b or p|c. 

By what we have just said, the notions of ‘prime’ and ‘irreducible’ coincide if 
R = Z, and the same argument applies if R is any GCD domain. However, in an 
arbitrary integral domain R, although any prime element is irreducible, an irreducible 
element need not be prime. (For example, in the integral domain R consisting of all 
complex numbers of the form a + b./—5, where a,b é€ Z, it may be seen that 
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6 =2x3 = (1+ V—5)(1 — V—5) has two essentially distinct factorizations into 
irreducibles, and thus none of these irreducibles is prime.) 

The proof of Proposition 7 shows that, in an arbitrary integral domain R, every 
element which is neither zero nor a unit can be represented as a product of finitely 
many irreducible elements if and only if the following chain condition is satisfied: 


(#) there exists no infinite sequence (ay) of elements of R such that ay+1 is a proper 
divisor of ay for every n. 


Furthermore, the representation is essentially unique (i.e. unique except for the order 
of the factors and for multiplying them by units) if and only if R is also a GCD domain. 

An integral domain R is said to be factorial (or a ‘unique factorization domain’) 
if the ‘fundamental theorem of arithmetic’ holds in R, i.e. if every element which is 
neither zero nor a unit has such an essentially unique representation as a product of 
finitely many irreducibles. By the above remarks, an integral domain R is factorial if 
and only if it is a GCD domain satisfying the chain condition (#). 

For future use, we define an element of a factorial domain to be square-free if it 
is neither zero nor a unit and if, in its representation as a product of irreducibles, no 
factor is repeated. In particular, a positive integer is square-free if and only if it is a 
nonempty product of distinct primes. 


2 The Bézout Identity 


If a, b are arbitrary integers with a 0, then there exist unique integers g, r such that 
b=qat+r, O<r<|al. 


In fact ga is the greatest multiple of a which does not exceed b. The integers q and r 
are called the quotient and remainder in the ‘division’ of b by a. 

(For a > 0 this was proved in Proposition I.14. It follows that if a and n are positive 
integers, any positive integer b less than a” has a unique representation ‘to the base a’: 


b=bot+biat-+-+b,-1a", 


where 0 < b; < a for all j. In fact b,-; is the quotient in the division of b by qh) 


bn—2 is the quotient in the division of the remainder by qn? and so on.) 
Ifa, b are arbitrary integers with a ¥ 0, then there exist also integers g, r such that 


b=qa+t+r, |r| < la|/2. 


In fact ga is the nearest multiple of a to b. Thus qg and r are not uniquely determined 
if b is midway between two consecutive multiples of a. 

Both these division algorithms have their uses. We will be impartial and merely 
use the fact that 


b=qa+t+r, |r| <|al. 


An ideal in the commutative ring Z of all integers is defined to be a nonempty 
subset J such that ifa,b € J andx, y € Z, then also ax + by eé J. 
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For example, if aj,..., a, are given elements of Z, then the set of all linear com- 
binations a,x; +--+ + ay,X, with x1,...,x, € Z is an ideal, the ideal generated by 
a\,...,d,. An ideal generated by a single element, i.e. the set of all multiples of that 
element, is said to be a principal ideal. 


Lemma 9 Any ideal J in the ring Z is a principal ideal. 


Proof If 0 is the only element of J, then 0 generates J. Otherwise there is a nonzero 
a € J with minimum absolute value. For any b € J, we can write b = qa +r, for 
some g,r € Z with |r| < |a|. By the definition of an ideal, r € J and so, by the 
definition of a, r = 0. Thus a generates J. 


Proposition 10 Any a,b € Z have a greatest common divisor d = (a, b). Moreover, 
for any c € Z, there exist x, y € Z such that 


ax+by=c 
if and only if d divides c. 


Proof Let J be the ideal generated by a and b. By Lemma 9, J is generated by a 
single element d. Since a, b € J, d is acommon divisor of a and b. On the other hand, 
since d € J, there exist u,v € Z such that d = au+bv. Hence any common divisor of 
a and b also divides d. Thus d = (a, b). The final statement of the proposition follows 
immediately since, by definition, c € J if and only if there exist x, y € Z such that 
ax+by=c. 


It is readily shown that if the ‘linear Diophantine’ equation ax + by = c has a 
solution xo, yo € Z, then all solutions x, y € Z are given by the formula 


x=xo+kb/d, y=yo-—ka/d, 


where d = (a, b) and k is an arbitrary integer. 

Proposition 10 provides a new proof for the existence of greatest common divisors 
and, in addition, it shows that the greatest common divisor of two integers can be rep- 
resented as a linear combination of them. This representation is usually referred to as 
the Bézout identity, although it was already known to Bachet (1624) and even earlier 
to the Hindu mathematicians Aryabhata (499) and Brahmagupta (628). 

In exactly the same way that we proved Proposition 10 — or, alternatively, by 
induction from Proposition 10 — we can prove 


Proposition 11 Any finite set a\,...,a, of elements of Z has a greatest common 
divisor d = (a,,...,4n). Moreover, for any c € Z, there exist x, ..., Xn € Zsuch that 


A\X] +++ + anXn = C 
if and only if d divides c. 


The proof which we gave for Proposition 10 is a pure existence proof — it does 
not help us to find the greatest common divisor. The following constructive proof was 
already given in Euclid’s Elements (Book VII, Proposition 2). Let a,b be arbitrary 
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integers. Since (0, b) = b, we may assume a ¥ 0. Then there exist integers g, r such 
that 


b=qa+r, |r| <|al. 
Put ap = b, a, = a and repeatedly apply this procedure: 


a9 =qiaj+a2,_ |ap| < lal, 


a = qoa2 +43, |a3| < lal, 


an-2 = qn-14n-1 + an, |an| < lan-il, 
aNn-1 = QNnan. 


The process must eventually terminate as shown, because otherwise we would obtain 
an infinite sequence of positive integers with no least element. We claim that ay is a 
greatest common divisor of a and b. In fact, working forwards from the first equation 
we see that any common divisor c of a and b divides each a, and so, in particular, ay. 
On the other hand, working backwards from the last equation we see that ay divides 
each ax and so, in particular, a and b. 

The Bézout identity can also be obtained in this way, although Euclid himself 
lacked the necessary algebraic notation. Define sequences (xx), (yx) by the recurrence 
relations 


Xk+1 = Xk-1 — QkXk, Vel = Vk-1— eye (1S k < WN), 
with the starting values 
x0=0, x1=1, resp. yo=1, 1 =0. 


It is easily shown by induction that a, = axx + by, and so, in particular, ay = 
axy + byy. 

The Euclidean algorithm is quite practical. For example, the reader may use it to 
verify that 13 is the greatest common divisor of 2171 and 5317, and that 


49 x 5317 — 120 x 2171 = 13. 


However, the first proof given for Proposition 10 also has its uses: there is some 

advantage in separating the conceptual from the computational and the proof actually 

rests on more general principles, since there are quadratic number fields whose ring of 

integers is a ‘principal ideal domain’ that does not possess any Euclidean algorithm. 
It is not visibly obvious that the binomial coefficients 


mac, =(m+1)---(m+n)/1-2 os bieme n 


are integers for all positive integers m, n, although it is apparent from their combina- 
torial interpretation. However, the property is readily proved by induction, using the 
relation 


m+n ae = m+n—1 ran 4 m+n—1 C4 
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Binomial coefficients have other arithmetic properties. Hermite observed that ""*"C,, 
is divisible by the integers (m + n)/(m,n) and (m + 1)/(m + 1,7). In particular, the 
Catalan numbers (n + 1)~! ?"C,, are integers. The following proposition is a substan- 
tial generalization of these results and illustrates the application of Proposition 10. 


Proposition 12 Let (a,) be a sequence of nonzero integers such that, for allm,n > 1, 
every common divisor of ay and an divides an+n, and every common divisor of an 
and An+n divides ay. Then, for allm,n > 1, 


(i) (4m, Gn) = A(m,n)> 

(ii) Am,n <= Am41°**Am4n/A1+++ an € Z; 
Git) Amn is divisible by am+n/(Am, an), bY Am41/(Am+1, An) and by an41/(dm, An+1); 
(iv) (Amn 1; Am4 l,n> Am I,n4 1) = (Am I,n> Am t1ln—l> Amn t 1). 


Proof The hypotheses imply that 
(dm, 4n) = (Am, dmt+n) forallm,n > 1. 
Since dm = (dm, 4m), it follows by induction that a, |agm for all k > 1. Moreover, 


(km, A(k+1)m) = dm, 


since every common divisor of dgm and a(41)m divides ay. 

Put d = (m,n). Then m = dm’, n = dn’, where (m',n’) = 1. Thus there exist 
integers u,v such that m’u — n'v = 1. By replacing u,v by u + tn’, v + tm’ with any 
t > max{|u|, |v|}, we may assume that wu and v are both positive. Then 


(Amu, nv) = (Q(n'v41)d> Gnd) = Ad. 


Since ag divides (a, Ay) and (Am, An) divides (Any, Any), this implies (adm, dn) = ad. 
This proves (i). 

Since a1|dm+1, itis evident that Aj; € Z for allm > 1. We assume that n > 1 and 
Amn € Z for all smaller values of n and all m > 1. Since it is trivial that Ao,, € Z, we 
assume also that m > 1 and Aj, € Z for all smaller values of m. By Proposition 10, 
there exist x, y € Z such that 


AmX + dny = Am+n, 


since (Am, An) divides dy4n. Since 


A — Am+1°**Am+n —— Gm4m+1°** Gn+n-1 Am+1°***4m+n-1 
na = = -O_  tsO y, 
Qi --*ay a1 :°'an a, °***An-1 


our induction hypotheses imply that A,,,, € Z. This proves (ii). 
Since 


Am-+n Am,n—1 =adn Am,n > 


Am+n divides (Gy, m+n) Amn and, since (an, 4m4n) = (Am, dn), this in turn implies 
that dm+n/(Am, An) divides Amn. 
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Similarly, since 
Am+1Am+1,n = Amtn+1Am,ns Am+1Am+1,n—1 = anAm.ns 


Am+1 divides (an, Gm+n+1)Am,n and, since (dn, dm+n+1) = (Gm+1, Gn), it follows that 
Am+1/(Am+1, 4n) divides Am,». In the same way, since 


an4 1Am,n4 1 = 4m+n4 1Amns Gn41Am-—1,n4+1 = amAm,n, 


An41 divides (4m, Am4n41)Amn and hence dy41/(Gm,4n41) divides Amn. This 
proves (iii). 
By multiplying by aq -- + d@n41/dm+2-+++:Am+n—1, We see that (iv) is equivalent to 


(Anan t+14m+1, 4n+14m+n4m+n+1 > 4m4m414m-4 n) 


= (An + 14m4m+1> 4n4n+14mtn>, 4m+14m+n4m+n4 1). 


Since here the two sides are interchanged when m and n are interchanged, it is suf- 
ficient to show that any common divisor e of the three terms on the right is also a 
common divisor of the three terms on the left. We have 


(an t+14m4m+1>4n4n+14m4 1) = an +14m-4 1(am; an) = An+14m+1 (an, am+n) 


(an +14m4m+15 4m+14n414m-+4n), 


and similarly 


(Anan t14m+n>s 4n4+14m+n4m+n p= (anan t1&4m+n> 4m+14n+14m4 n)> 


(am t+14m+n4m+n+1>,4m4m+14m n) = (am4 14m4+n@m+n+l1>, 4m+14n+14m4 n): 


Hence if we put g = Am414n414m4n, then 


(€, 8) = (€, Andn+14m+1) = (€, An41dm+nGm+4n41) = (€, dm4m+1dm-+n) 
and if we put f = (e,g), then 
1 = (e/f, dndn+1dm+i/f) = (€/f, an+14m4nam4n4i/f) = (€/f, dmam+1dm+4n/f)- 
Hence (e/f, P/f*) = 1, where 


P = yn Qn414m41 * n414mtnGmtn+1 * ImGn+14mtn- 


But P is divisible by e?, since we can also write 


P = 4y414m4m41 * Andn414mtn * In+14m4nGn+n41- 


Hence the previous relation implies e/f = 1. Thus e = f is a common divisor of 
Andn+14m+1, An414m4nGm4tn+1 and dnAm+14n+4n, aS we wished to show. 


For the binomial coefficient case, i.e. ad, = n, the property (iv) of Proposi- 
tion 12 was discovered empirically by Gould (1972) and then proved by Hillman and 
Hoggatt (1972). It states that if in the Pascal triangle one picks out the hexagon sur- 
rounding a particular element, then the greatest common divisor of three alternately 
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chosen vertices is equal to the greatest common divisor of the remaining three vertices. 
Hillman and Hoggatt also gave generalizations along the lines of Proposition 12. 

The hypotheses of Proposition 12 are also satisfied if a, = q” — 1, for some 
integer g > 1, since in this case @m4n = Gndn + Gn + Gy. The corresponding 
q-binomial coefficients were studied by Gauss and, as mentioned in Chapter XIII, they 
play a role in the theory of partitions. 

We may also take (a,,) to be the sequence defined recurrently by 


a=1, a2=cC, an42 =Can41+ ban(n > 1), 
where b and c are coprime positive integers. Indeed it is easily shown by induction that 
(Gn, Qn+-1) = (b,an41) = 1 foralln > 1. 
By induction on m one may also show that 
Am+n = 4n414n + bamadn—1 forallm >1, n> 1. 


It follows that the hypotheses of Proposition 12 are satisfied. In particular, for 
b =c = 1, they are satisfied by the sequence of Fibonacci numbers. 

We consider finally extensions of our results to more general algebraic structures. 
An integral domain R is said to be a Bézout domain if any a,b € R have a com- 
mon divisor of the form au + bv for some u,v € R. Since such a common divisor 
is necessarily a greatest common divisor, any Bézout domain is a GCD domain. It is 
easily seen, by induction on the number of generators, that an integral domain is a 
Bézout domain if and only if every finitely generated ideal is a principal ideal. Thus 
Propositions 10 and 11 continue to hold if Z is replaced by any Bézout domain. 

An integral domain R is said to be a principal ideal domain if every ideal is a 
principal ideal. 


Lemma 13 An integral domain R is a principal ideal domain if and only if it is a 
Bézout domain satisfying the chain condition 


(#) there exists no infinite sequence (a,) of elements of R such that an+1 is a proper 
divisor of an for every n. 


Proof It is obvious that any principal ideal domain is a Bézout domain. Suppose R is 
a Bézout domain, but not a principal ideal domain. Then RF contains an ideal J which 
is not finitely generated. Hence there exists a sequence (b,,) of elements of J such that 
bn+1 is not in the ideal J, generated by bj,...,b,. But J, is a principal ideal. If a, 
generates J,,, then a,+1 is a proper divisor of a, for every n. Thus the chain condition 
is violated. 

Suppose now that R is a Bézout domain containing a sequence (a,) such that ay+1 
is a proper divisor of a, for every n. Let J denote the set of all elements of R which 
are divisible by at least one term of this sequence. Then J is an ideal. For if a;|b and 
ag|c, where j < k, then also ag|b and hence ax|bx + cy for all x, y € R. If J were 
generated by a single element a, we would have a|a, for every n. On the other hand, 
since a € J, ay|a for some N. Hence ay |ay +1. Since ay+1 is a proper divisor of ay, 
this is a contradiction. Thus R is not a principal ideal domain. 
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It follows from the remarks at the end of Section | that a principal ideal domain 
is factorial, i.e. any element which is neither zero nor a unit can be represented as a 
product of finitely many irreducibles and the representation is essentially unique. 

In the next section we will show that the ring K[t] of all polynomials in one inde- 
terminate ¢ with coefficients from an arbitrary field K is a principal ideal domain. 

It may be shown that the ring of all algebraic integers is a Bézout domain, and 
likewise the ring of all functions which are holomorphic in a nonempty connected 
open subset G of the complex plane C. However, neither is a principal ideal domain. 
In the former case there are no irreducibles, since any algebraic integer a has the 
factorization a = ./a- ./a. In the latter case z — ¢ is an irreducible for any ¢ € G, but 
the chain condition is violated. For example, take 


an(z) = f@)/&—M1)---@— en), 


where f(z) is a non-identically vanishing function which is holomorphic in G and has 
infinitely many zeros (1, (2,...inG. 


3 Polynomials 


In this section we study the most important example of a principal ideal domain other 
than Z, namely the ring K [f] of all polynomials in ¢ with coefficients from an arbitrary 
field K (e.g., K = QorC). 

The attitude adopted towards polynomials in algebra is different from that adopted 
in analysis. In analysis we regard ‘r’ as a variable which can take different values; in 
algebra we regard ‘t’ simply as a symbol, an ‘indeterminate’, on which we can perform 
various algebraic operations. Since the concept of function is so pervasive, the alge- 
braic approach often seems mysterious at first sight and it seems worthwhile taking the 
time to give a precise meaning to an ‘indeterminate’. 

Let R be an integral domain (e.g., R = Z or Q). A polynomial with coefficients 
from R is defined to be a sequence f = (ao, a1, a2, ...) of elements of R in which at 
most finitely many terms are nonzero. The sum and product of two polynomials 


f =(a0,41,%,.--), g = (bo, b1, b2,...) 
are defined by 


f+g=(aot+b0,a, +b1,a2+b2,...), 
fg = (agbo, aobi + ai bo, agb2 + a,b) + azbo, ...). 


It is easily verified that these are again polynomials and that the set R[t] of all polyno- 
mials with coefficients from R is a commutative ring with O = (0,0,0,...) as zero 
element. (By dropping the requirement that at most finitely many terms are nonzero, 
we obtain the ring R[[t]] of all formal power series with coefficients from R.) 

We define the degree 0(f) of a polynomial f = (ao, a1, a2,...) # O to be the 
greatest integer n for which a, ~ 0 and we put 


if) =2, jo|=0. 
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It is easily verified that, for all polynomials f, g, 


If +gl smax{|fl.lgl},  lfsl=Ifllsl- 
Since | f| > 0, with equality if and only if f = O, the last property implies that R[f] is 
an integral domain. Thus we can define divisibility in R[t], as explained in Section 1. 
The set of all polynomials of the form (ao, 0, 0,...) is a subdomain isomorphic 
to R. By identifying this set with R, we may regard R as embedded in R[t]. The only 
units in R[t] are the units in R, since 1 = ef implies | = |e|| f| and hence |e] = 1. 
If we put t = (0, 1, 0,0,...), then 


1? = tt = (0,0,1,0,...), #8 =tt? =(0,0,0,1,...),.... 


Hence if the polynomial f = (ao, a1, a2,...) has degree n, then it can be uniquely 
expressed in the form 


f =agtayt+---+ant” (an #0). 


We refer to the elements ag, a1,..., a, of R as the coefficients of f. In particular, ao 
is the constant coefficient and a, the highest coefficient. We say that f is monic if its 
highest coefficient a, = 1. 

If also 


gH=bot bitte ++ bnt™ (bm # 9), 
then the sum and product assume their familiar forms: 


f+tg=(anth) +a tb)tt(mth)r?+---, 
fg = aobo + (aob + aibo)t + (aobz + ayby + agbo)t? +++ 


Suppose now that R = K is a field, and let 


f =ay+ajt +--+ + ant" (an #0), 
g=botbitt+---+bmt” (bm #0) 


be any two nonzero elements of K[t]. If |g| < |f|,ie.ifm <n, then g = gf +r, 
with g = O andr = g. Suppose on the other hand that | f| < |g|. Then 


g= a bye + gt, 


where gi e€ K[t] and lg" | < |g|. If | fl < lg" |, the process can be repeated with gt in 
place of g. Continuing in this way, we obtain g,r € K[t] such that 


gs=aftr irl <Ifl. 
Moreover, g and r are uniquely determined, since if also 


g=aftn, Ini<Ifl, 
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then 
(q-q)f=n-?r, Iri—r| < If, 


which is only possible if g = q. 
Ideals in K [t] can be defined in the same way as for Z and the proof of Lemma 9 
remains valid. Thus K [tf] is a principal ideal domain and, a fortiori, a GCD domain. 
The Euclidean algorithm can also be applied in K [t] in the same way as for Z and 
again, from the sequence of polynomials fo, f1,..., fy which it provides to deter- 
mine the greatest common divisor fy of fo and f| we can obtain polynomials ux, vx 
such that 


fk = fiur+ for OOSK<N). 


We can actually say more for polynomials than for integers, since if 


Se-1 = 4k Sk + ferris | Se+il < fel 


then | fx—1| = |g«l|_f,| and hence, by induction, 


| fx-1||uxl = | fol. |fe-illokl =1fil A <k < N). 


It may be noted in passing that the Euclidean algorithm can also be applied in the 
ring K [t, t~!] of Laurent polynomials. A Laurent polynomial f 4 O, with coefficients 
from the field K, has the form 


1 
f = Amt” ad ape ee ant", 


where m,n € Zwithm <nandaj € K with ama, ¢ 0. Thus we can write f = t’”" fo, 
where fo € K[t]. Put 


If{=2"", JO] =0; 


then the division algorithm for ordinary polynomials implies one for Laurent polyno- 
mials: for any f, g € K{t, t~'] with Ff # O, there exist g,r € K{t, t~'] such that 
g=aftr, trl <Ifl. 

We return now to ordinary polynomials. The general definition for integral domains 
in Section | means, in the present case, that a polynomial p € K[t] is irreducible if it 
has positive degree and if every proper divisor has degree zero. 

It follows that any polynomial of degree | is irreducible. However, there may exist 
also irreducible polynomials of higher degree. For example, we will show shortly that 
the polynomial 1? — 2 is irreducible in Q[t]. For K = C, however, every irreducible 
polynomial has degree 1, by the fundamental theorem of algebra (Theorem I.30) and 
Proposition 14 below. It follows that, for K = R, every irreducible polynomial has 
degree | or 2. (For if a real polynomial f(t) has a root a € C\R, its conjugate a is 
also a root and f(t) has the real irreducible factor (t — a)(t — @).) 

It is obvious that the chain condition (#) of Section | holds in the integral domain 
K[t], since if g is a proper divisor of f, then |g| < | f|. It follows that any polyno- 
mial of positive degree can be represented as a product of finitely many irreducible 
polynomials and that the representation is essentially unique. 
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We now consider the connection between polynomials in the sense of algebra 
(polynomial forms) and polynomials in the sense of analysis (polynomial functions). 
Let K bea field and f € K[t]: 


f =a + ait +--+ + at". 
If we replace ‘t’ by c € K we obtain an element of K, which we denote by f(c): 
f(c) =ao taic+---+ anc". 


A rapid procedure (‘Horner’s rule’) for calculating f(c) is to use the recurrence rela- 
tions 


fo = 4n, fj = fj-1¢ + Qn-j G =1,.5.57). 
It is readily shown by induction that 
fj = ancl = an—1ci—! +++ + an-j, 


and hence f(c) = fp is obtained with just n multiplications and n additions. 

It is easily seen that f = g +h implies f(c) = g(c) + A(c), and f = gh implies 
f(c) = g(c)h(c). Thus the mapping f — f(c) isa ‘homomorphism’ of K [ft] into K. 
A simple consequence is the so-called remainder theorem: 


Proposition 14 Let K bea field andc € K. If f € K{t], then 
f=C¢-c)gt fo), 


for some g € K{t]. 
In particular, f is divisible by t — c if and only if f (c) = 0. 


Proof We already know that there exist g,r € K[t] such that 


f=C@-ogtr, Irlsl. 


Thus r € K and the homomorphism properties imply that f(c) = r. 
We say that c € K is a root of the polynomial f € K[t] if f(c) = 0. 


Proposition 15 Let K be a field. If f € K[t] is a polynomial of degree n > 0, then f 
has at most n distinct roots in K. 


Proof If f is of degree 0, then f = c is a nonzero element of K and f has no roots. 
Suppose now that n > | and the result holds for polynomials of degree less than n. If 
cis aroot of f then, by Proposition 14, f = (t — c)g for some g € K[f]. Since g has 
degree n — 1, it has at most n — 1| roots. But every root of f distinct from c is a root 
of g. Hence f has at most n roots. 


We consider next properties of the integral domain R[t], when R is an integral 
domain rather than a field (e.g., R = Z). The famous Pythagorean proof that /2 is 
irrational is considerably generalized by the following result: 
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Proposition 16 Let R be a GCD domain and K its field of fractions. Let 
f =ap+ayt+--++ant” 


be a polynomial of degree n > 0 with coefficientsaj ¢ R(O< j <n).Ifce K isa 
root of f andc = ab—!, where a,b € Rand (a, b) = 1, then b\ay and a|ao. 
In particular, if f is monic, thenc € R. 


Proof We have 
agb" + ajab""!+.--+a,_1a"'b + a,a" =0. 


Hence bla,a” and al|agb”. Since (a”, b) = (a, b”) = 1, by Proposition 3(v), the result 
follows from Proposition 3(ii). 


The polynomial t* —2 has no integer roots, since 0, 1, —1 are not roots and if c € Z 
andc # 0, 1, —1, thenc* > 4. Consequently, by Proposition 16, the polynomial 7 — 2 
also has no rational roots. It now follows from Proposition 14 that t? — 2 is irreducible 
in Q[f], since it has no divisors of degree 1. 

Proposition 16 was known to Euler (1774) for the case R = Z. In this case it shows 
that to obtain all rational roots of a polynomial with rational coefficients we need test 
only a finite number of possibilities, which can be explicitly enumerated. For exam- 
ple, if ¢ € Z, the cubic polynomial t? + zt + 1 has no rational roots unless z = 0 or 
z= 2: 

It was shown by Gauss (1801), again for the case R = Z, that Proposition 16 
may itself be considerably generalized. His result may be formulated in the following 
way: 


Proposition 17 Let f, | € R[t], where R is a GCD domain with field of fractions K. 
Then g divides f in R[t] if and only if g divides f in K[t] and the greatest common 
divisor of the coefficients of g divides the greatest common divisor of the coefficients 


of f. 


Proof For any polynomial f € R[t], let c(f) denote the greatest common divisor of 
its coefficients. We say that f is primitive if c(f) = 1. We show first that the product 
Jf = gh of two primitive polynomials g, h is again primitive. 

Let 


gH=botbitt+::-, ha=cotcitt+:::, f=aotatt+-::-, 


and assume on the contrary that the coefficients a; have a common divisor d which 
is not a unit. Then d does not divide all the coefficients b;, nor all the coeffi- 
cients cg. Let bm, Cy be the first coefficients of g, which are not divisible by d. 


Then 
An+n = > bj Ck 
jtk=m-+n 


and d divides every term on the right, except possibly b,c,. In fact, since d|am+n, 
d must also divide b,,c,. Hence we cannot have both (d, bm) = 1 and (d, cy) = 1. 
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Consequently we can replace d by a proper divisor d’, again not a unit, for which 
m' +n’ > m-+na. Since there exists a divisor d for which m + n is a maximum, this 
yields a contradiction. 

Now let f, g be polynomials in R[t] such that g divides f in K[t]. Thus f = gH, 
where H € K[t]. We can write H = ab~'ho, where a, b are coprime elements of R 
and ho is a primitive polynomial in R[t]. Also 


f=c(f)fo, g =c(g)go, 
where fo, go are primitive polynomials in R[t]. Hence 
be(f) fo = ac(g)goho. 
Since goho is primitive, it follows that 
bc(f) = ac(g). 


If H € R{[t], then b = 1 and so c(g)|c(f). On the other hand, if c(g)|c(f), then 
bc(f)/c(g) = a. Since (a, b) = 1, this implies that b = 1 and H € R[f]. 


Corollary 18 [f R is a GCD domain, then R{t] is also a GCD domain. If, moreover, 
R is a factorial domain, then R{t] is also a factorial domain. 


proof Let K denote the field of fractions of R. Since K[t] is a GCD domain and 
R{t] C K[t], R[t] is certainly an integral domain. If f, g € R[t], then there exists 
a primitive polynomial hg € R[t] which is a greatest common divisor of f and g in 
K[t]. It follows from Proposition 17 that 


h = (c(f), e(g))ho 


is a greatest common divisor of f and g in R[f]. 

This proves the first statement of the corollary. It remains to show that if R also 
satisfies the chain condition (#), then R[t] does likewise. But if f, © R[t] and 
Jn+ilJn for every n, then f,, must be of constant degree for all large n. The second 
statement of the corollary now also follows from Proposition 17 and the chain 
condition in R. Oo 


It follows by induction that in the statement of Corollary 18 we may replace 
R[t] by the ring R[t,...,tm] of all polynomials in finitely many indeterminates 
t,...,tm With coefficients from R. In particular, if K is a field, then any polyno- 
mial f € K[t,..., tm] such that f ¢ K can be represented as a product of finitely 
many irreducible polynomials and the representation is essentially unique. 

It is now easy to give examples of GCD domains which are not Bézout domains. 
Let R be a GCD domain which is not a field (e.g., R = Z). Then some ap € R is 
neither zero nor a unit. By Corollary 18, R[t] isa GCD domain and, by Proposition 17, 
the greatest common divisor in R[t] of the polynomials ao and ¢ is |. If there existed 
g,h € R[t] such that 


ang +th =1, 
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where g = bo) + bit + ---, then by equating constant coefficients we would obtain 
aobo = 1, which is a contradiction. Thus R[t] is not a Bézout domain. 
As an application of the preceding results we show that if aj,..., a, are distinct 


integers, then the polynomial 


f=|[|[@-a)-1 


j=l 


is irreducible in Q[t]. Assume, on the contrary, that f = gh, where g,h € Q[t] and 
have positive degree. We may suppose without loss of generality that g € Z[t] and 
that the greatest common divisor of the coefficients of g is 1. Since f € Z[t], it then 
follows from Proposition 17 that also h € Z[r]. Thus g(a;) and h(a;) are integers for 
every j. Since g(a;)h(aj) = —1, it follows that g(a;) = —h(a;). Thus the polyno- 
mial g +h has the distinct roots aj, ..., d,. Since g +/ has degree less than n, this is 
possible only if g +h = O. Hence f = —g?. But, since the highest coefficient of f 
is 1, this is a contradiction. 

In general, it is not an easy matter to determine if a polynomial with rational 
coefficients is irreducible in Q[t]. However, the following irreducibility criterion, due 
to Eisenstein (1850), is sometimes useful: 


Proposition 19 /f 
FQ =ap tat +--+ tant? 1 +2" 


is a monic polynomial of degree n with integer coefficients such that ao, a\,..., An—1 
are all divisible by some prime p, but ao is not divisible by p”, then f is irreducible in 


Qlt]. 


Proof Assume on the contrary that f is reducible. Then there exist polynomials 
g(t), h(t) of positive degrees /, m with integer coefficients such that f = gh. If 


g(t) = bot bit +++. +bit', 
h(t) =co+cit+-:-+ cmt”, 


then ag = boco. The hypotheses imply that exactly one of bo, co is divisible by p. With- 
out loss of generality, assume it to be bg. Since p divides aj = boc, + bic09, it follows 
that p|b,. Since p divides az = boc2+b\c1+b2c9, it now follows that p|b2. Proceeding 
in this way, we see that p divides b; for every j < /. But, since bjcm = 1, this yields a 
contradiction. 


It follows from Proposition 19 that, for any prime p, the p-th cyclotomic polyno- 
mial 


p(x) ag Ag? ee I 


is irreducible in Q[x]. For p(x) = (x? — 1)/(x — 1) and, if we put x = 1 +1, the 
transformed polynomial 
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satisfies the hypotheses of Proposition 19. 
For any field K, we define the formal derivative of a polynomial f € K[f], 


f =aytayt+--++ ant”, 
to be the polynomial 
f! =a, + 2angt +--+ 4+ nayt”!. 


If the field K is of characteristic 0 (see Chapter I, §8), then 0(f’) = 0(f) — 1. 
Formal derivatives share the following properties with the derivatives of real 
analysis: 


@ (ftsy=f'+e9's 

(ii) (cf) =cf' foranyce K; 
(iii) ( fg)’ = f’e + fa’: 
(iv) (f*) = kf *"' f' for any k EN. 


The first two properties are easily established and the last two properties then need 
only be verified for monomials f = t”, g = 1”. 
We can use formal derivatives to determine when a polynomial is square-free: 


Proposition 20 Let f be a polynomial of positive degree with coefficients from a 
field K. If f is relatively prime to its formal derivative f', then f is a product of 
irreducible polynomials, no two of which differ by a constant factor. Conversely, if f 
is such a product and if K has characteristic 0, then f is relatively prime to f’. 


Proof If f = g7h for some polynomials g,h € K[t] with 0(g) > 0 then, by the rules 
above, 


f' =2gg'h + g?h'. 


Hence g|f’ and f, f’ are not relatively prime. 
On the other hand, if f = p1--- pm is a product of essentially distinct irreducible 
polynomials p;, then 


f! = Py p2-+: Pm + PIPoP3+ ++ Pm +++ + Pi ++ Pm-1Pin- 


If the field K has characteristic 0, then p/ is of lower degree than p, and is not the zero 
polynomial. Thus the first term on the right is not divisible by pj, but all the other terms 
are. Therefore pi{f’, and hence (f’, p1) = 1. Similarly, (f’, pj) = 1 for 1 < j <m. 
Since essentially distinct irreducible polynomials are relatively prime, it follows that 


(ff) =1. 


For example, it follows from Proposition 20 that the polynomial t” — 1 € K[t] is 
square-free if the characteristic of the field K does not divide the positive integer n. 
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4 Euclidean Domains 


An integral domain R is said to be Euclidean if it possesses a Euclidean algorithm, i.e. 
if there exists a map 6: R > N U {0} such that, for any a,b € R witha 4 0, there 
exist g,r © R with the properties 


b=qa+r, Oo(r) < O(a). 
It follows that d(a) > 6(0) for any a ¥ 0. For there exist g1, a1 € R such that 
O=qiata,, d(q) < d(a), 
and if a, 4 0 there exist gn+1, dn41 € R such that 


O = gn+1dn + Gn+1, O(an+1) < 0(an). 


Repeatedly applying this process, we must arrive at ay = 0 for some N, since the 
sequence {0(a,)} cannot decrease forever, and we then have 6(0) = d(ay) <--: < 
O(a) < O(a). 

By replacing 6 by 6 — d(0) we may, and will, assume that 6(0) = 0, d(a) > 0 if 
a #0. 

Since the proof of Lemma 9 remains valid if Z is replaced by R and |a| by d(a), 
any Euclidean domain is a principal ideal domain. 

The polynomial ring K[t] is a Euclidean domain with d(a) = |a| = 2°. 
Polynomial rings are characterized among all Euclidean domains by the following 
result: 


Proposition 21 For a Euclidean domain R, the following conditions are equivalent: 


(i) for any a,b € R witha 0, there exist unique q,r € R such thatb = qa+r, 
o(r) < d(a); 
(ii) for any a,b,c € Rwithc £0, 


0(a +b) < max{d(a), 6(b)}, d(a) < d(ac). 


Moreover, if one or other of these two conditions holds, then either R is a field and 
O(a) = 6(1) for every a # 0, or R = K[t] for some field K and 6 is an increasing 
function of ||. 


Proof Suppose first that (i) holds. If a 4 0, c 4 0, then from 0 = 0a —0 = ca — ac, 
we obtain d(ac) > 0(a), and this holds also if a = 0. If we take c = —1 and replace a 
by —a, we get 6(—a) = d(a). Since b = O(a +b) +b = l(a +b) + (—a), it follows 
that either 6(b) > d(a + b) or d(a) > 6(a +b). Thus (i) > (ii). 

Suppose next that (11) holds. Assume that, for some a,b € R with a # 0, there 
exist pairs g,r and q’,r’ such that 


b=qa+r=qd'atr', max{d(r), d(r’)} < d(a). 


From (ii) we obtain first 6(—r) = 6(r) and then 6(r’ — r) < max{6(r), d(r’)} < d(a). 
Since r’—r = a(q—q’), this implies g —g’ = O and hence r’ —r = 0. Thus (ii) => (i). 
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Suppose now that (i) and (i1) both hold. Then 6(1) < 6(a) for any a ¥ 0, since 
a = la. Furthermore, 6(a) = 6(ae) for any unit e, since 


6(a) < d(ae) < d(aee™!) = d(a). 
On the other hand, 6(a) = 6(ae) for some a ¥ 0 implies that e is a unit. For from 
a=qae+t+r, d(r) < d(ae), 


we obtainr = (1—qe)a, d(r) < d(a), and hence |—qe = O. In particular, d(e) = d(1) 
if and only if e is a unit. 

The set K of alla € R such that d(a) < 6(1) thus consists of 0 and all units of R. 
Since a,b € K impliesa—b e K, it follows that K is a field. We assume that K ¥ R, 
since otherwise we have the first alternative of the proposition. 

Choose x € R\K so that 


O(x) = ae 0(a). 


aéeR 


For any a € R\K, there exist go, ro € R such that 
a=qoxt+ro, (ro) < d(x), 


ie. ro € K. Then d(qo) < d(gox) = O(a — 10) < O(a). If d(go) = (x), Le. if 
go € R\K, then in the same way there exist q1,71 € R such that 


go=Hgxtnrn,neK, (qi) < 6(qgo). 


After finitely many repetitions of this process we must arrive at some gn-1 € K. 
Putting 7, = gn—1, we obtain 


A =Fyx" +rp_x" | +--+ +70, 


where ro,...,7n € K and ry # 0. Since d(rjx/) = d(x/) if rj 4 O and d(x/) < 
d(x/*!) for every j, it follows that d(a) = d(x"). Since the representation a = qx"+r 
with d(r) < d(x”) is unique, it follows that ro, ..., 7 are uniquely determined by a. 
Define amap y: R > K[t] by 


1 


(rn x" + rp x | $e 70) = Int” + rpait | Ho +10. 


Then y is a bijection and actually an isomorphism, since it preserves sums and prod- 
ucts. Furthermore 6(a) >, =, or < 6(b) according as |y(a)| >, =, or < |w(b)|. 


Some significant examples of principal ideal domains are provided by quadratic 
fields, which will be studied in Chapter II. Any quadratic number field has the form 
Q(V/d), where d € Z is square-free and d £ 1. The set Gy of all algebraic integers in 
Q(/d) is an integral domain. In the equivalent language of binary quadratic forms, it 
was known to Gauss that Gy is a principal ideal domain for nine negative values of d, 
namely 


get, 3,9 9, Si 210, 89. 7 163, 
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Heilbronn and Linfoot (1934) showed that there was at most one additional negative 
value of d for which Gg is a principal ideal domain. Stark (1967) proved that this 
additional value does not in fact exist, and soon afterwards it was observed that a gap 
in a previous proof by Heegner (1952) could be filled without difficulty. It is conjec- 
tured that Gy is a principal ideal domain for infinitely many positive values of d, but 
this remains unproved. 

Much work has been done on determining for which quadratic number fields 
Q(Vd) the ring of integers @y is a Euclidean domain. Although we regard being 
Euclidean more as a useful property than as an important concept, we report here the 
results which have been obtained for their intrinsic interest. 

The ring @ is said to be norm-Euclidean if it is Euclidean when one takes d(a) to 
be the absolute value of the norm of a. It has been shown that @q is norm-Euclidean 
for precisely the following values of d: 


d =—11,.-7, —3,.—2, —1, 2, 3,.5,.6, 7, 11, 13,17, 19, 21,29, 33,37, 41,57, 73. 


Itis known that, ford < 0, Gg is Euclidean only if it is norm-Euclidean. Comparing the 
two lists, we see that ford = —19, —43, —67, —163, @q is a principal ideal domain, 
but not a Euclidean domain. On the other hand it is also known that, for d = 69, Gz is 
Euclidean but not norm-Euclidean. 
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The invention of a new notation often enables one to replace a long, involved argu- 
ment by simple and mechanical algebraic operations. This is well illustrated by the 
congruence notation. 

Two integers a and b are said to be congruent modulo a third integer m if m divides 
a — b, and this is denoted by a = b modm. For example, 


13=4mod3, 13=-—7mod5, 19=7mod4. 


The notation is a modification by Gauss of the notation a = b mod m used by 
Legendre, as Gauss explicitly acknowledged (D.A., §2). (If a and b are not congruent 
modulo m, we write a # bmodm.) Congruence has, in fact, many properties in 
common with equality: 


(C1) a =amodm for all a, m,; (reflexive law) 
(C2) if a=bmodm, then b = amodm; (symmetric law) 
(C3) if a=bandb=cmodm, thena = cmodm; (transitive law) 
(C4) ifa=a' andb=D' modm, thena+b=a'+b’' and 

ab = a'b' modm. (replacement laws) 


The proofs of these properties are very simple. For any a, m we havea -—a =0= 
m -0. If m divides a — b, then it also divides b — a = —(a —b). If m divides both a —b 
and b — c, then it also divides (a — b) + (b — c) = a —c. Finally, if m divides both 
a — a’ and b — D’, then it also divides (a — a’) + (b — b’) = (a +b) — (a’ +b’) and 
(a-—a')b+a'(b—b'!) =ab—-a'b’. 

The properties (C1)—(C3) state that congruence mod m is an equivalence relation. 
Since a = b implies a = bmodm, it is a coarsening of the equivalence relation of 
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equality (but coincides with it if m = 0). The corresponding equivalence classes are 
called residue classes. The set Z with equality replaced by congruence mod m will be 
denoted by Zim). If m > 0, Zim) has cardinality m, since an arbitrary integer a can be 
uniquely represented in the form a = gm+r, wherer € {0,1,...,m—I}andq e€ Z. 
The particular r which represents a given a € Z is referred to as the least non-negative 
residue of amodm. 

The replacement laws imply that the associative, commutative and distributive laws 
for addition and multiplication are inherited from Z by Z(m). Hence Zim) is a commu- 
tative ring, with 0 as an identity element for addition and | as an identity element for 
multiplication. However, Zi) is not an integral domain if m is composite, since if 
m =m'm" with 1 < m’ < m, then 


m'm"” = 0, but m’ £4 0,m” 4 Omodm. 


On the other hand, if ab = ac modm and (a, m) = 1, then b = c modm, by Proposi- 
tion 3(ii). Thus factors which are relatively prime to the modulus can be cancelled. 

In algebraic terms, Zim) is the quotient ring Z/mZ of Z with respect to the ideal 
mZ generated by m, and the elements of Zim) are the cosets of this ideal. For conve- 
nience, rather than necessity, we suppose from now on that m > 1. 

Congruences enter implicitly into many everyday problems. For example, the ring 
Z(2) contains two distinct elements, 0 and 1, with the addition and multiplication tables 


040214 1=00+1S140=k 
0S lS=1S6 1-1 = 1, 


This is the arithmetic of odds (1) and evens (0), which is used by electronic computers. 
Again, to determine the day of the week on which one was born, from the date and 
day of the week today, is an easy calculation in the arithmetic of Z(7) (remembering 
that 366 = 2 mod7). 
The well-known tests for divisibility of an integer by 3 or 9 are easily derived by 
means of congruences. Let the positive integer a have the decimal representation 


a=aj+a,10+---+a,10", 


where dg, d1,...,d, € {0,1,...,9}. Since 10 = | modm, where m = 3 or 9, the 
replacement laws imply that 10 = 1 modm for any positive integer k and hence 


a=ajgta,+-::-+a,modm. 


Thus a is divisible by 3 or 9 if and only if the sum of its digits is so divisible. 

This can be used to check the accuracy of arithmetical calculations. Any equa- 
tion involving only additions and multiplications must remain valid when equality is 
replaced by congruence mod m. For example, suppose we wish to check if 


7714 x 3036 = 23,419,804. 


Taking congruences mod 9, we have on the left side 19 x 12 = 1 x 3 = 3 and on the 
right side 5+ 144+ 12 =5+5+3 = 4. Since 4 ¥ 3 mod 9, the original equation is 
incorrect (the 8 should be a 7). 
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Since the distinct squares in Z4) are 0 and 1, it follows that an integer a = 3 mod4 
cannot be represented as the sum of two squares of integers. Similarly, since the distinct 
squares in Z,g) are 0,1,4, an integer a = 7 mod 8 cannot be represented as the sum of 
three squares of integers. 

The oldest known work on number theory is a Babylonian cuneiform text, from at 
least as early as 1600 B.C., which contains a list of right-angled triangles whose side 
lengths are all exact multiples of the unit length. By Pythagoras’ theorem, the problem 
is to find positive integers x, y, z such that 


x? + y? = 2’, 
For example, 3, 4,5 and 5, 12, 13 are solutions. The number of solutions listed sug- 
gests that the Babylonians not only knew the theorem of Pythagoras, but also had some 
rule for finding such Pythagorean triples. There are in fact infinitely many, and a rule 
for finding them all is given by Euclid in his Elements (Book X, Lemma | following 
Proposition 28). This rule will now be derived. 

We may assume that x and y are relatively prime since, if x, y, z is a Pythagorean 
triple for which x and y have greatest common divisor d, then d*|z* and hence d|z, 
so that x /d, y/d, z/d is also a Pythagorean triple. If x and y are relatively prime, then 
they are not both even and without loss of generality we may assume that x is odd. If 
y were also odd, we would have 


Pax? t+y*=14+1=2mod4, 

which is impossible. Hence y is even and z is odd. Then 2 is a common divisor of 
z+x and z—.x, and is actually their greatest common divisor, since (x, y) = 1 implies 
(x, z) = 1. Since 


(y/2)’ = @+x)/2-@—x)/2 
and the two factors on the right are relatively prime, they are also squares: 
(c+x)/2=a’, (¢—x)/2=b’, 
where a > b > Oand (a, b) = 1. Then 
x=a’*—b’, y = 2ab, z=a’ +b’. 


Moreover a and b cannot both be odd, since z is odd. 

Conversely, if x, y, z are defined by these formulas, where a and 5 are relatively 
prime positive integers with a > b and either a or b even, then x, y, z is a Pythagorean 
triple. Moreover x is odd, since z is odd and y even, and it is easily verified that 
(x, y) = 1. For given x and z, a? and b” are uniquely determined, and hence a and b 
are also. Thus different couples a, b give different solutions x, y, z. 

To return to congruences, we now consider the structure of the ring Zi). If 
a = a’modm and (a,m) = 1, then also (a’,m) = 1. Hence we may speak of an 
element of Z(m) as being relatively prime to m. The set of all elements of Zim) which 


are relatively prime to m will be denoted by Zin): If a is a unit of the ring Zn), then 
clearly a € Lin . The following proposition shows that, conversely, if a € Z i then 


a is a unit of the ring Zim). 
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Proposition 22 The set Lim) is a commutative group under multiplication. 


Proof By Proposition 3(iv), Z (m) is closed under multiplication. Since multiplication 
is associative and commutative, it only remains to show that any a € Zim) has an 


‘ =| x 
inverse a~ € Lin): 


The elements of Lin ,) May be taken to be the positive integers c1,..., cy which 
are less than m and relatively prime to m, and we may choose the notation so that 
c; = 1. Since ac; = acg modm implies c; = cx modm, the elements acj,..., acp 
are distinct elements of Lim) and hence are a permutation of c1,..., cy. In particular, 
ac; = c,;modm for one and only one value of i. (The existence of inverses also fol- 
lows from the Bézout identity au + mv = 1, since this implies au = 1 modm. Hence 
the Euclidean algorithm provides a way of calculating a~!.) 


Corollary 23 If p is a prime, then Zip) is a finite field with p elements. 


Proof We already know that Zp) is a commutative ring, whose distinct elements are 
represented by the integers 0,1,..., p — 1. Since p is a prime, Zep) consists of all 
nonzero elements of Zp). Since Zon) is a multiplicative group, by Proposition 22, it 
follows that Zp) is a field. 


The finite field Z(,) will be denoted from now on by the more usual notation F,. 
Corollary 23, in conjunction with Proposition 15, implies that if p is a prime and f a 
polynomial of degree n > 1, then the congruence 


f(x) = Omod p 


has at most n mutually incongruent solutions mod p. This is no longer true if the mod- 
ulus is not a prime. For example, the congruence x* — 1 = 0 mod8 has the distinct 
solutions x = 1, 3,5, 7 mod8. 

The order of the group Lim)? i.e. the number of positive integers less than m and rel- 
atively prime to m, is traditionally denoted by g(m), with the convention that g(1) = 1. 
For example, if p is a prime, then g(p) = p — 1. More generally, for any positive 
integer k, 


g(p*) = pk — p*, 


x 
(p*) 
p‘—!. By Proposition 4, if m = m'm", where (m’, m’’) = 1, then g(m) = o(m’)o(m"). 
Together with what we have just proved, this implies that if an arbitrary positive integer 
m has the factorization 


since the elements of Z, pk) Which are not in Z are the multiples jp withO < j < 


as a product of positive powers of distinct primes, then 


gp(m) = pi" (p1 — 1): pS (ps — 1). 
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In other words, 
g(m) =m] ]G-1/p). 
p\|m 


The function g(m) was first studied by Euler and is known as Euler’s phi-function 
(or ‘totient’ function), although it was Gauss who decided on the letter g. Gauss (D.A., 
§39) also established the following property: 


Proposition 24 For any positive integer n, 


> e@) =n, 


d|n 
where the summation is over all positive divisors d of n. 


Proof Let d be a positive divisor of n and let Sg denote the set of all positive integers 
m <n such that (m,n) = d. Since (m,n) = d if and only if (m/d,n/d) = 1, the 
cardinality of Sy is g(n/d). Moreover every positive integer m < n belongs to exactly 
one such set Sg. Hence 


n= > e(n/d) = > 9), 


d|n d|n 


since n/d runs through the positive divisors of n at the same time as d. 


Much of the significance of Euler’s function stems from the following property: 
Proposition 25 [fm is a positive integer and a an integer relatively prime to m, then 


a?) = 1 modm. 


Proof Let ci,...,cn, where h = g(m), be the distinct elements of Lim): As we saw 
in the proof of Proposition 22, the elements acj,..., acy of Lin) are just a permu- 
tation of cj, ..., cy. Forming their product, we obtain acy +++Ch = Cy+++ ch modm. 


Since the c’s are relatively prime to m, they can be cancelled and we are left with 
a" = 1modm. 


Corollary 26 If p is a prime anda an integer not divisible by p, thena?—! = 1 mod p. 


Corollary 26 was stated without proof by Fermat (1640) and is commonly known 
as ‘Fermat’s little theorem’. The first published proof was given by Euler (1736), who 
later (1760) proved the general Proposition 25. 

Proposition 25 is actually a very special case of Lagrange’s theorem that the order 
of a subgroup of a finite group divides the order of the whole group. In the present case 
the whole group is Z fe 1) and the subgroup is the cyclic group generated by a. 

Euler gave also another proof of Corollary 26, which has its own interest. For any 
two integers a, b and any prime p we have, by the binomial theorem, 
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P 
(a+b)? = SP Cpa br, 
k=0 


where the binomial coefficients 
PC, =(p—k+1)---p/1-2----- k 


are integers. Moreover p divides ?C; forO0 < k < p, since p divides ’Cx - k! and is 
relatively prime to k! It follows that 


(a+b)? =a? +b? mod p. 


In particular, (a + 1)? = a? + 1modp, from which we obtain by induction a? = 
amod p for every integer a. If p does not divide a, the factor a can be cancelled to 
give a?-' = 1 mod p. 

The first part of the second proof actually shows that in any commutative ring R, 
of prime characteristic p, the map a > a? is a homomorphism: 


(a+b)? =a?+b?, (ab)? =a?b?. 


(As defined in 88 of Chapter I, R has characteristic k if k is the least positive integer 
such that the sum of k 1’s is 0, and has characteristic zero if there is no such positive 
integer.) By way of illustration, we give one important application of this result. 

We showed in 83 that, for any prime p, the polynomial 


@,(x) = xP Pad 


is irreducible in Q[x]. The roots in C of ®,(x) are the p-th roots of unity, other 
than 1. By a quite different argument we now show that, for any positive integer n, the 
‘primitive’ n-th roots of unity are the roots of a monic polynomial ®, (x) with integer 
coefficients which is irreducible in Q[x]. The uniquely determined polynomial @, (x) 
is called the n-th cyclotomic polynomial. 

Let ¢ be a primitive n-th root of unity, ie. ¢” = 1 but c* #4 1 forO < k <n. 
It follows from Corollary 18 that ¢ is a root of some monic irreducible polynomial 
f(x) € Z[x] which divides x” — 1. If p is a prime which does not divide n, then ¢? is 
also a primitive n-th root of unity and, for the same reason, ¢? is a root of some monic 
irreducible polynomial g(x) € Z[x] which divides x” — 1. 

We show first that g(x) = f(x). Assume on the contrary that g(x) # f(x). Then 


x" —1= f(x)g(x)hx) 
for some h(x) € Z[x]. Since ¢ is a root of g(x’), we also have 
g(x?) = f(x)k(x) 


for some k(x) € Z[x]. If a (x),... denotes the polynomial in F,[x] obtained from 
f(x), ... by reducing the coefficients mod p, 
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then 
x" —1= f(x)a@)h(x), 8x?) = f(x)kQ). 


But g(x”) = g(x)?, since F,[x] is a ring of characteristic p and a? = a for every 
a € Fy. Hence any irreducible factor e(x) of f(x) in Fp[x] also divides g(x). Con- 
sequently (x)? divides x” — 1 in F plx]. But x” — 1 is relatively prime to its formal 
derivative nx”"—!, since p{n, and so is square-free. This is the desired contradiction. 

By applying this repeatedly for the same or different primes p, we see that ¢” is 
a root of f(x) for any positive integer m less than n and relatively prime to n. If @ is 
any n-th root of unity, then @ = ¢* for a unique k such that 0 < k <n. If (k,n) £1, 
then w’ = | for some proper divisor d of n (cf. Lemma 31 below). If such an @ were a 
root of f(x), then f(x) would divide x“ — 1, which is impossible since ¢ is not a root 
of x4 — 1. Hence f(x) does not depend on the original choice of primitive n-th root 
of unity, its roots being all the primitive n-th roots of unity. The polynomial f(x) will 
now be denoted by ®, (x). Since x” — 1 is square-free, we have 


x"-1=]][a(). 


dln 


This yields a new proof of Proposition 24, since g(x) has degree g(d). 
As an application of Fermat’s little theorem (Corollary 26) we now prove 


Proposition 27 [f p is a prime, then (p — 1)! + 1 is divisible by p. 


Proof Since 1! + 1 = 2, we may suppose that the prime p is odd. By Corollary 26, 
the polynomial f(t) = t?~! — 1 has the distinct roots 1,2,..., p — 1 in the field F >. 
But the polynomial g(t) = (¢ — 1)(t — 2)---(t — p + 1) has the same roots. Since 
f(t) — g(t) is a polynomial of degree less than p — 1, it follows from Proposition 15 
that f(t) — g(t) is the zero polynomial. In particular, f(t) and g(t) have the same 
constant coefficient. Since (—1)? —! = 1. this yields the result. 


Proposition 27 is known as Wilson’s theorem, although the first published proof 
was given by Lagrange (1773). Lagrange observed also that (n — 1)! + 1 is divisible 
by n only if n is prime. For suppose n = n/n”, where 1 < n’,n" <n. Ifn' £n”, then 
both n’ and n” occur as factors in (n — 1)! and hence n divides (n — 1)! If in’ = n” > 2 
then, since n > 2n’, both n’ and 2n’ occur as factors in (n — 1)! and again n divides 
(n — 1)! Finally, if n = 4, then n divides (n — 1)! +2. 

As another application of Fermat’s little theorem, we prove Euler’s criterion for 
quadratic residues. If p is a prime and a an integer not divisible by p, we say that a 
is a quadratic residue, or quadratic nonresidue, of p according as there exists, or does 
not exist, an integer c such that c> = amod p. Thus a is a quadratic residue of p if 
and only if it is a square in laa Euler’s criterion is the first statement of the following 
proposition: 


Proposition 28 [f p is an odd prime and a an integer not divisible by p, then 
a—)/2 = 1 er —1modp, 


according as a is a quadratic residue or nonresidue of p. 
Moreover, exactly half of the integers 1,2,..., p — 1 are quadratic residues of p. 
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Proof If a is a quadratic residue of p, then a = c* mod p for some integer c and 
hence, by Fermat’s little theorem, 


a'P—)/2 = eP-! = | mod p. 


Since the polynomial t’?—)/? — | has at most (p — 1)/2 roots in the field F p» it follows 
that there are at most r := (p — 1)/2 distinct quadratic residues of p. On the other 
hand, no two of the integers 17, 2”, ..., 7? are congruent mod p, since u? = v? mod p 
implies uv = v or u = —v mod p. Hence there are exactly (p — 1)/2 distinct quadratic 
residues of p and, if b is a quadratic nonresidue of p, then b'’—!)/? ¥ 1 mod p. Since 
bP-! = 1 mod p, and 


pei _ya (pP-Y/2 = lwwe-D? +i), 


we must have b'?—!)/? = —1 mod p. 


Corollary 29 /f p is an odd prime, then —1 is a quadratic residue of p if p = 1 mod4 
and a quadratic nonresidue of p if p = 3 mod 4. 


Euler’s criterion may also be used to determine for what primes 2 is a quadratic 
residue: 


Proposition 30 For any odd prime p,2 is a quadratic residue of p if p = +1 mod8 
and a quadratic nonresidue if p = +3 mod8. 


Proof Let A denote the set of all even integers a such that p/2 <a < p,and let B 
denote the set of all even integers b such that 0 < b < p/2. Since A U B is the set 
of all positive even integers less than p, it has cardinality r := (p — 1)/2. Evidently 
a € Aifand only if p—a is odd and 0 < p—a < p/2. Hence the integers 1, 2,...,7 
are just the elements of B, together with the integers p — a(a € A). If we denote the 
cardinality of A by #A, it follows that 


ri=|[@-@[]o 


acA beB 
= (-1)*[][a]][o modp 
aecA beB 
= (-1)*42"r! 


Thus 2” = (—1)*4 mod p and hence, by Proposition 28, 2 is a quadratic residue or 
nonresidue of p according as #A is even or odd. But #A = k if p = 4k + 1 and 
#A =k+ lif p = 4k + 3. The result follows. 


We now introduce some simple group-theoretical concepts. Let G be a finite group 
and a € G. Then there exist j,k ¢ N with j < k such that a/ = a*. Thus a‘~/ = 1, 
where | is the identity element of G. The order of a is the least positive integer d such 
that a? = 1. 


Lemma 31 Let G be a finite group of order n anda an element of G of order d. Then 
(i) for any k € N, ak = 1 if and only if d divides k; 
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(ii) for any k € N, a* has order d/(k, d); 
(iii) H = {1,a,...,a?} isa subgroup of G and d divides n. 


Proof Any k € N can be written in the form k = gd+r,whereg > Oand0 <r <d. 
Since a4 = (a7)4 = 1, we have a* = 1 if and only if a” = 1, ie. if and only if r = 0, 
by the definition of d. 

It follows that if a* has order e, then ke = [k, d]. Since [k, d] = kd/(k, d), this 
implies e = d/(k, d). In particular, a* again has order d if and only if (k,d) = 1. 
IfO< j,k <d,puti=jt+kifyj+k <dandi=j+k-—difj+k>d.Then 
k — q', and so H contains the product of any two of its elements. If 0 < k < d, 
then aa4—* = 1, and so H contains also the inverse of any one of its elements. Finally 
d divides n, by Lagrange’s theorem that the order of a subgroup divides the order of 
the whole group. 


ala 


The subgroup H in Lemma 31 is the cyclic subgroup generated by a. For G = 
Z (m)? the case which we will be interested in, there is no need to appeal to 
Lagrange’s theorem, since Lin) has order y(m) and d divides g(m), by Proposition 25 
and Lemma 31(i). 

A group G is cyclic if it coincides with the cyclic subgroup generated by one of its 
elements. For example, the n-th roots of unity in C form a cyclic group generated by 
e?7i/" Tn fact the generators of this group are just the primitive n-th roots of unity. 

Our next result provides a sufficient condition for a finite group to be cyclic. 


Lemma 32 A finite group G of order n is cyclic if, for each positive divisor d of n, 
there are at most d elements of G whose order divides d. 


Proof If H is acyclic subgroup of G, then its order d divides n. Since all its elements 
are of order dividing d, the hypothesis of the lemma implies that any element of G 
whose order divides d must be in H. Furthermore, H contains exactly y(d) elements 
of order d since, if a generates H, ak has order d if and only if (k,d) = 1. 

For each divisor d of n, let y(d) denote the number of elements of G of or- 
der d. Then, by what we have just proved, either y(d) = 0 or yw(d) = g(d). But 
Dain y(d) =n, since the order of each element is a divisor of n, and Dan g(d) =n, 
by Proposition 24. Hence we must have y(d) = g(d) for every d|n. In particular, the 
group G has y(n) = y(n) elements of order n. 


The condition of Lemma 32 is also necessary. For let G be a finite cyclic group of 
order n, generated by the element a, and let d be a divisor of n. An element x € G has 
order dividing d if and only if x“ = 1. Thus the elements a‘ of G of order dividing d 
are given by k = jn/d, with j =0,1,...,d—1. 

We now return from group theory to number theory. 


Proposition 33 For any prime p, the multiplicative group ls of the field F p is cyclic. 


Proof Put G = Fe and denote the order of G by n. For any divisor d of n, the 


polynomial t¢ — 1 has at most d roots in F p- Hence there are at most d elements of G 
whose order divides d. The result now follows from Lemma 32. 
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The same argument shows that, for an arbitrary field K, any finite subgroup of the 
multiplicative group of K is cyclic. 

In the terminology of number theory, an integer which generates Z a is said to be 
a primitive root of m. Primitive roots may be used to replace multiplications mod m 
by additions mod y(m) in the same way that logarithms were once used in analysis. If 
g iS a primitive root of m, then the elements of Zim) are precisely 1, g,g7,...,¢”"!, 
where n = g(m). Thus for each a € Z*,. we have a = g% modm for a unique index 
a (0 < a <n). Wecan construct a table of these indices once and for all. Ifa = g% 
and b = gf, then ab = g“*#. By replacing a + f by its least non-negative residue y 
mod n and going backwards in our table we can determine c such that ab = cmodm. 

For any prime p, an essentially complete proof for the existence of primitive roots 
of p was given by Euler (1774). Jacobi (1839) constructed tables of indices for all 
primes less than 1000. 

We now use primitive roots to prove a general property of polynomials with coef- 
ficients from a finite field: 


Proposition 34 /f f(x1,...,Xn) is a polynomial of degree less than n in n variables 
with coefficients from the finite field F p, then the number of zeros of f in F', is divisible 
by the characteristic p. In particular, (O,...,0) is not the only zero of f if f has no 
constant term. 


Proof Put K = Fy, and g = 1 — fP—!. Ifa = (a,...,an) is a zero of f, then 
g(a) = 1. Ifa is nota zero of f, then f(a)?—! = 1 and g(a) = 0. Hence the number 
N of zeros of f satisfies 


N= > g(a) mod p. 
aek" 
We will complete the proof by showing that 


> g(a) =0. 


aeK" 


Since g has degree less than n(p — 1), it is a constant linear combination of poly- 
nomials of the form xi! + xh where kj +---+ky <n(p—1). Thusk; < p—1 for 
at least one j. Since 


y ah vat = (Lab) (D ot), 


aek" ayjeK anEK 


it is enough to show that Sx := S-uex a* is zero for0 < k < p—1.Ifk = 0, then 
a‘ = 1 and So = p- 1 = 0. Suppose 1 < k < p —1 and let b be a generator for the 
multiplicative group K* of K. Then c := b* # 1 and 


p-l 
Sp= ye = c(c?—! — 1)/(c- 1) =0. 
j=l 
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The general case of Proposition 34 was first proved by Warning (1936), after 
the particular case had been proved by Chevalley (1936). As an illustration, the par- 
ticular case implies that, for any integers a,b,c and any prime p, the congruence 
ax* + by* + cz? = Omod p has a solution in integers x, y, z not all divisible by p. 

If m is not a prime, then Zi) is not a field. However, we now show that the group 
Lin) is cyclic also if m = p? is the square of a prime. 

Let g be a primitive root of p. It follows from the binomial theorem that 


(g + p)? = g? mod p’. 


Hence, if g? = gmod p”, then (g + p)? # g + pmod p’. Thus, by replacing g by 
g + p if necessary, we may assume that g?~! # 1 mod p. If the order of g in Ze) is 


d, then d divides y(p?) = p(p—1). But g(p) = p—1 divides d, since g4 = 1 mod p* 
implies g? = 1 mod p and g is a primitive root of p. Since p is prime andd 4 p — 1, 
it follows that d = p(p — 1), ie. Zo) is cyclic with g as generator. 

We briefly state some further results about primitive roots, although we will not use 
them. Gauss (D.A., 889-92) showed that the group Zim) is cyclic if and only if 
m € {2,4, p*,2p*}, where p is an odd prime and k € N. Evidently 1 is a primi- 
tive root of 2 and 3 is a primitive root of 4. If g is a primitive root of p?, where p is an 
odd prime, then g is a primitive root of p* for every k € N; and if g' = g org + p*, 
according as g is odd or even, then g’ is a primitive root of 2p*. 

By Fermat’s little theorem, if p is prime, then a?~'! = 1 mod p for every a € Z 
such that (a, p) = 1. With the aid of primitive roots we will now show that there 
exist also composite integers n such that a”~'! = 1 modn for every a € Z such that 
(a,n) =1. 


Proposition 35 For any integern > 1, the following two statements are equivalent: 


(i) a”~! = 1 modn for every integer a such that (a,n) = 1; 
(ii) n is a product of distinct primes and, for each prime p|n, p — 1 dividesn — 1. 


Proof Suppose first that (i) holds and assume that, for some prime p, p*|n. As we 
have just proved, there exists a primitive root g of p*. Evidently p + g. It is easily 
seen that there exists c € N such that a = g + cp’ is relatively prime to n; in fact we 
can take c to be the product of the distinct prime factors of n, other than p, which do 
not divide g. Since n divides a”~! — 1, also p* divides a”~! — 1. But a, like g, is a 
primitive root of p*, and so its order in Zi2) is p(p*) = p(p — 1). Hence p(p — 1) 
divides n — 1. But this contradicts p|n. 

Now let p be any prime divisor of n and let g be a primitive root of p. In the same 
way as before, there exists c € N such that a = g+cp is relatively prime to n. Arguing 
as before, we see that g(p) = p — | divides n — 1. This proves that (i) implies (ii). 

Suppose next that (ii) holds and let a be any integer relatively prime to n. If p isa 
prime factor of n, then p + a and hence a?~! = 1 mod p. Since p — 1 divides n — 1, 
it follows that a"~! = 1 mod p. Thus a”~! — 1 is divisible by each prime factor of n 
and hence, since n is squarefree, also by n itself. 


Proposition 35 was proved by Carmichael (1910), and a composite integer n with 
the equivalent properties stated in the proposition is said to be a Carmichael number. 
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Any Carmichael number n must be odd, since it has an odd prime factor p such that 
p— 1 divides n — 1. Furthermore a Carmichael number must have more than two prime 
factors. For assume n = pg, where 1 < p < q < nandq — 1 divides n — 1. Since 
q = |mod(q — 1), it follows that 


0= pq-—1=p-—1modgq —-1), 


which contradicts p < q. 

The composite integer 561 = 3 x 11 x 17 is a Carmichael number, since 560 is 
divisible by 2,10 and 16, and it is in fact the smallest Carmichael number. The taxi- 
cab number 1729, which Hardy reckoned to Ramanujan was uninteresting, is also a 
Carmichael number, since 1729 = 7 x 13 x 19. Indeed it is not difficult to show that 
if p, 2p — 1 and 3p — 2 are all primes, with p > 3, then their product is a Carmichael 
number. Recently Alford, Granville and Pomerance (1994) confirmed a long-standing 
conjecture by proving that there are infinitely many Carmichael numbers. 

Our next topic is of greater importance. Many arithmetical problems require for 
their solution the determination of an integer which is congruent to several given 
integers according to various given moduli. We consider first a simple, but important, 
special case. 


Proposition 36 Let m = m'm", where m' and m" are relatively prime integers. Then, 
for any integers a’, a", there exists an integer a, which is uniquely determined modm, 
such that 


a=a'modm', a=a"modm". 


Moreover, a is relatively prime to m if and only if a’ is relatively prime to m' and a” is 
relatively prime to m". 


Proof By Proposition 22, there exist integers c’, c’’ such that 
cm” =1modm’, c’m’ =1modm”. 


Thus e’ := c’m” is congruent to | modm’ and congruent to 0modm”. Similarly 


e” := cm’ is congruent to Omodm’ and congruent to 1 modm”. It follows that 


a=a'e' + a"e" is congruent to a’ modm’ and congruent to a” modm”. 

It is evident that if b = amodm, then also b = a! modm’ and b = a” modm". 
Conversely, if b satisfies these two congruences, then b — a = Omodm’ andb-—a= 
0 modm”. Hence b — a = 0 modm, by Proposition 3(i). 

Since m’ and m” are relatively prime, it follows from Proposition 3(iv) that 
(a,m) = 1 if and only if (a,m’) = (a,m”) = 1. Since a = a’modm’ implies 
(a,m') = (a',m’), and a = a” modm” implies (a,m”) = (a”, m”), this proves the 
last statement of the proposition. 


In algebraic terms, Proposition 36 says that ifm = m'm”, where m’ and m” are rel- 
atively prime integers, then the ring Zm) is (isomorphic to) the direct sum of the rings 
Zim’) and Zn). Furthermore, the group Z = is (isomorphic to) the direct product of 


the groups Zin’) and Z* 


(m’)* 
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Proposition 36 can be considerably generalized: 


Proposition 37 For any integers m,,...,Mn and aj,..., dy, the simultaneous con- 
gruences 


xX =a,;modm,,...,x =d,modm, 
have a solution x if and only if 
aj =agmod(mj,mz) forl<j<k<n. 
Moreover, y is also a solution if and only if 
y=xmod[m,..., my]. 


proof The necessity of the conditions is trivial. For if x is a solution and if djx = 
(mj, mx) is the greatest common divisor of m; and mx, then aj = x = ax moddjx. 
Also, if y is another solution, then y — x is divisible by m,,...,m, and hence also by 
their least common multiple [m,..., my]. 

We prove the sufficiency of the conditions by induction on n. Suppose first that 
n = 2 and a; = a2 modd, where d = (m1, m2). By the Bézout identity, 


d=x\m, — x2m2 
for some x1, x2 € Z. Since aj — ag = kd for some k € Z, it follows that 
X i= a, —kxjm, = az — kxom2 


is a solution. 
Suppose next that n > 2 and the result holds for all smaller values of n. Then there 
exists x’ € Z such that 


x’ =a;modm; forl <i <n, 


and x’ is uniquely determined mod m’, where m’ = [m,,..., Mn—1]. Since any solu- 
tion of the two congruences 


i / 
x =x modm',x =a,modm, 


is a solution of the given congruences, we need only show that x’ = ay, mod(m’, my). 
But, by the distributive law connecting greatest common divisors and least common 
multiples, 


(m’, mn) = (m1, mn), --., (Mn—1, Mn) I. 


Since x! = a; = a, mod(m;, mp) for 1 < i <n, it follows that x’ = a, mod(m’, my). 


oO 
Corollary 38 Let m,..., my be integers, any two of which are relatively prime, and 
let m =m ---mMy be their product. Then, for any given integers a, ..., Qn, there is a 


unique integer x modm such that 
x =a,;modm,...,x = dy, modmy. 


Moreover, x is relatively prime to m if and only if a; is relatively prime to m; for 
l<i<n. 
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Corollary 38 can also be proved by an extension of the argument used to prove 
Proposition 36. Both Proposition 37 and Corollary 38 are referred to as the Chinese 
remainder theorem. Sunzi (4th century A.D.) gave a procedure for obtaining the solu- 
tion x = 23 of the simultaneous congruences 


x =2mod3, x=3mo0d5, x =2mod7. 


Qin Jiushao (1247) gave a general procedure for solving simultaneous congruences, 
the moduli of which need not be pairwise relatively prime, although he did not state 
the necessary condition for the existence of a solution. The problem appears to have 
its origin in the construction of calendars. 


6 Sums of Squares 


Which positive integers n can be represented as a sum of two squares of integers? The 
question is answered completely by the following proposition, which was stated by 
Girard (1625). Fermat (1645) claimed to have a proof, but the first published proof 
was given by Euler (1754). 


Proposition 39 A positive integer n can be represented as a sum of two squares if and 
only if for each prime p = 3 mod 4 that divides n, the highest power of p dividing n is 
even. 


Proof We observe first that, since 
(x? + y?)(W? + 07) = (xu + yo)? + (xo — yu)’, 


any product of sums of two squares is again a sum of two squares. 

Suppose n = x? + y* for some integers x, y and that n is divisible by a prime 
p = 3mod4. Then x? = —y? mod p. But —1 is not a square in the field F,, by 
Corollary 29. Consequently we must have y? = x* = Omod p. Thus p divides both 
x and y. Hence p? divides n and (n/p)? = (x/p)* + (y/p)’. It follows by induction 
that the highest power of p which divides n is even. 

Thus the condition in the statement of the proposition is necessary. Suppose now 
that this condition is satisfied. Then n = qm, where q is square-free and the only 
possible prime divisors of g are 2 and primes p = 1 mod4. Since m* = m? + 0? and 
2 = 12+ 1”, it follows from our initial observation that n is a sum of two squares if 
every prime p = | mod 4 is a sum of two squares. Following Gauss (1832), we will 
prove this with the aid of complex numbers. 

A complex number y = a+ Di is said to be a Gaussian integer if a, b € Z. The set 
of all Gaussian integers will be denoted by Y. Evidently y € Y implies y € Y, where 
y =a — bi is the complex conjugate of y. Moreover a, 8 € Y impliesat Be GY 
and af € Y. Thus Y is a commutative ring. In fact Y is an integral domain, since it 
is a subset of the field C. We are going to show that Y can be given the structure of a 
Euclidean domain. 

Define the norm of a complex number y = a + bi to be 


NQ)=y7 =a +0’. 
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Then N(y) > 0, with equality if and only if y = 0, and N(yiy2) = N(y1)N(y2). If 
y € Y, then N(y) is an ordinary integer. Furthermore, y is a unit in Y, i.e. y divides 1 
in Y, if and only if N(y) = 1. 

We wish to show that if a, 6 € Y anda + 0, then there exist x, p € Y such that 


B=xatp, N(p) <N(a). 
We have Ba~! =r +i, where r,s € Q. Choose a, b € Z so that 
jr—al < 1/2, |s—b| < 1/2. 
Ifx =a+b)i, thenx € Y and 
N(Ba-! —x) < 1/44+1/4=1/2 <1. 


Hence if p = f — xa, thenp € Y and N(p) < N(a). 

It follows that we can apply to Y the whole theory of divisibility in a Euclidean 
domain. Now let p be a prime such that p = 1 mod 4. We will show that p is a sum of 
two squares by constructing 6 € Y for which N(f) = p. 

By Corollary 29, there exists an integer a such that a* = —1 mod p. Puta =a+i. 
Then N(a) = aa = a* + 1 is divisible by p in Z and hence also in Y. However, nei- 
ther a nor &@ is divisible by p in Y, since ap! and ap7! are not in Y. Thus p is not 
a prime in Y and consequently, since Y is a Euclidean domain, it has a factorization 
p = fy, where neither f nor y is a unit. Hence N(f) > 1, N(y) > 1. Since 


N(B)N(y) = N(p) =P’, 


it follows that N(f) = N(y) = p. 


Proposition 39 solves the problem of representing a positive integer as a sum of 
two squares. What if we allow more than two squares? When congruences were first 
introduced in 85, it was observed that a positive integer a = 7mod8 could not be 
represented as a sum of three squares. It was first completely proved by Gauss (1801) 
that a positive integer can be represented as a sum of three squares if and only if it is 
not of the form 4”a, where n > 0 and a = 7mod8. The proof of this result is more 
difficult, and will be given in Chapter VII. 

It was conjectured by Bachet (1621) that every positive integer can be represented 
as a sum of four squares. Fermat claimed to have a proof, but the first published proof 
was given by Lagrange (1770), using earlier ideas of Euler (1751). The proof of the 
four-squares theorem we will give is similar to that just given for the two-squares 
theorem, with complex numbers replaced by quaternions. 


Proposition 40 Every positive integer n can be represented as a sum of four squares. 


Proof A quaternion y = a+ bi + cj + dk will be said to be a Hurwitz integer 
if a,b,c,d are either all integers or all halves of odd integers. The set of all 
Hurwitz integers will be denoted by .~#. Evidently y € # implies y € .#, where 
y =a —bi —cj —dk. Moreovera, 6 € # implies a + B € #. We will show that 
a, B € # also implies af € #. 
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Evidently y € # if and only if it can be written in the form y = agh + ayi + 
a2j +.a3k, where ag, a1, 42,43 € Zandh = (1+i+ j +4)/2. It is obvious that the 
product of h with i, j or k is again in .# and it is easily verified that h* = h — 1. It 
follows that .# is closed under multiplication and hence is a ring. 

Define the norm of a quaternion y = a + bi + cj + dk to be 


Ny) =y7 = 4+ 4+? +a’. 
Then N(y) > 0, with equality if and only if y = 0. Moreover, since 7772 = y271, 


N(yiy2) = yiy2y2v1 = yiyiy272 = NQ1)N(y2). 


Ify ¢ #, then N(y) = yy € # and hence N(y) is an ordinary integer. Further- 
more, y is aunit in #, i.e. y divides | in #, if and only if N(y) = 1. 

We now show that a Euclidean algorithm may be defined on #. Suppose a, f € 
H anda 4 0. Then 


Ba! =rotnitnj +k, 
where ro, 71,72,73 € Q. Ifk =agh + ayi + aaj + 43k, then 


Ba! —K« = (19 — ao/2) + (1 — 0/2 — a1)i + (r2 — a0/2 — a2) j 
+ (73 — ag/2 — a3)k. 


We can choose ay € Z so that |2r79 — ao| < 1/2 and then choose a, € Z so that 
Iry — ao/2 —a,| < 1/2 (v= 1,2,3). Thenx € # and 


N(Ba-! —x) < 1/16 +3/4 = 13/16 <1. 
Thus if we set p = 6 — xa, thenp € # and 
N(p) = N(Ba"! —x)N(a) < N(a). 


By repeating this division process finitely many times we see that any a, 8 € # 
have a greatest common right divisor 6 = (a, /),. Furthermore, there is a left Bézout 
identity: 6 = €a + nf for some €,7 € #. 

If a positive integer n is a sum of four squares, say n = a? + b? + c? + d?, then 
n=yy,wherey =a+bi+cj+dk € #. Since the norm of a product is the product 
of the norms, it follows that any product of sums of four squares is again a sum of four 
squares. Hence to prove the proposition we need only show that any prime p is a sum 
of four squares. 

We show first that there exist integers a, b such that a? + b* = —1 mod p. This 
follows from the illustration given for Proposition 34, but we will give a direct proof. 

If p = 2, we can take a = 1, b = 0. If p = 1 mod4 then, by Corollary 29, there 
exists an integer a such that a7 = —1 mod p and we can take b = 0. Suppose now that 
p = 3mod4. Let c be the least positive quadratic non-residue of p. Then c > 2 and 
c — | is a quadratic residue of p. On the other hand, —1 is a quadratic non-residue of 
p, by Corollary 29. Hence, by Proposition 28, —c is a quadratic residue. Thus there 
exist integers a, b such that 


a’ =—c,b? =c—1modp, 


and then a* + b? = —1 mod p. 
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Put a = 1 + ai + bj. Then p divides N(a) = aa = 1 + a* +b? in Z and hence 
also in .#. However, p does not divide either a or @ in #, since ap—! and @p7! are 
not in #. 

Let y = (p,a)-. Then p = fy for some f € #. If B were a unit, p would be a 
right divisor of y and hence also of a, which is a contradiction. Therefore N(f) > 1. 
Evidently y a is acommon right divisor of pa and aa, and the Bézout representation 
for y implies that ya = (pa,aa),. Since pa = ap and p divides aa, it follows 
that p is a right divisor of ya. Since p does not divide a, y is not a unit and hence 
N(y) > 1. Since 


N(B)N(y) = N(p) = p*, 


we must have N(f) = N(y) = p. 

Thus if y =co+cii + coj +c3k, then cp + cf +.c5 +c} = p.Ifco,...,¢3 are 
all integers, we are finished. Otherwise co, ..., c3 are all halves of odd integers. Hence 
we can write c, = 2d, + ey, where d, € Z and e, = +1/2. If we put 


d=dotditdajt+dask, ¢e=egt+eyiterj + e3k, 


then y = 26 +e and N(e) = 1. Hence 6 := yé = 2dé + | has all its coordinates 
integers and N(@) = N(y) = p. 


In his Meditationes Algebraicae, which also contains the first statement in print of 
Wilson’s theorem, Waring (1770) stated that every positive integer is a sum of at most 
4 positive integral squares, of at most 9 positive integral cubes and of at most 19 posi- 
tive integral fourth powers. The statement concerning squares was proved by Lagrange 
in the same year, as we have seen. The statement concerning cubes was first proved by 
Wieferich (1909), with a gap filled by Kempner (1912), and the statement concerning 
fourth powers was first proved by Balasubramanian, Deshouillers and Dress (1986). 

In a later edition of his book, Waring (1782) raised the same question for higher 
powers. Waring’s problem was first solved by Hilbert (1909), who showed that, for 
each k € N, there exists yz € N such that every positive integer is a sum of at most 
yx k-th powers. The least possible value of yx is traditionally denoted by g(k). For 
example, g(2) = 4, since 7 = 27 +3 - 1” is not a sum of less than 4 squares. 

A lower bound for g(k) was already derived by Euler (c. 1772). Let m = | (3/2)* | 
denote the greatest integer < (3/2)* and take 


n=2‘m—-1. 


Since 1 < n < 3%, the only k-th powers of which n can be the sum are of, 1* and 24. 
Since the number of powers 2* must be less than m, and since n = (m — 12" + 
(2* — 1)1*, the least number of k-th powers with sum n is m + 2* — 2. Hence 
g(k) > w(k), where 


w(k) = |(3/2)*| + 2* —2. 
In particular, 


w(2)=4, w3)=9, w(4)=19, w(5)=37, w(6) =73. 
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By the results stated above, g(k) = w(k) fork = 2,3, 4 and this has been shown to 
hold also for k = 5 by Chen (1964) and for k = 6 by Pillai (1940). 

Hilbert’s method of proof yielded rather large upper bounds for g(k). A completely 
new approach was developed in the 1920’s by Hardy and Littlewood, using their ana- 
lytic ‘circle’ method. They showed that, for each k € N, there exists 7; € N such that 
every sufficiently large positive integer is a sum of at most 7; k-th powers. The least 
possible value of J; is traditionally denoted by G(k). For example, G(2) = 4, since 
no positive integer n = 7 mod 8 is a sum of less than four squares. Davenport (1939) 
showed that G(4) = 16, but these are the only two values of k for which today G(k) 
is known exactly. 

It is obvious that G(k) < g(k), and in fact G(k) < g(k) for all k > 2. In par- 
ticular, Dickson (1939) showed that 23 and 239 are the only positive integers which 
require the maximum 9 cubes. Hardy and Littlewood obtained the upper bound G(k) < 
(k —2)2‘—! +5, but this has been repeatedly improved by Hardy and Littlewood them- 
selves, Vinogradov and others. For example, Wooley (1992) has shown that G(k) < 
k(logk + loglogk + O(1)). 

By using the upper bound for G(k) of Vinogradov (1935), it was shown by 
Dickson, Pillai and Niven (1936-1944) that g(k) = w(k) for any given k > 6, 
provided that 


G72)" —(6)") = T=). 


It is possible that this inequality holds for every k € N. Fora given k, it may be checked 
by direct calculation, and Kubina and Wunderlich (1990) have verified in this way that 
the inequality holds if k < 471600000. Furthermore, using a p-adic extension by 
Ridout (1957) of the theorem of Roth (1955) on the approximation of algebraic num- 
bers by rationals, Mahler (1957) proved that there exists kg € N such that the inequality 
holds for all k > kp. However, the proof does not provide a means of estimating ko. 

Thus we have the bizarre situation that G(k) is known for only two values of k, 
that g(k) is known for a vast number of values of k and is given by a simple formula, 
probably for all k, but the information about g(k) is at present derived from informa- 
tion about G(k). Is it too much to hope that an examination of the numerical data will 
reveal some pattern in the fractional parts of (3/2)*? 
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There are many good introductory books on the theory of numbers, e.g. Davenport [4], 
LeVeque [28] and Scholz [41]. More extensive accounts are given in Hardy and 
Wright [15], Hua [18], Narkiewicz [33] and Niven et al. [34]. 

Historical information is provided by Dickson [5], Smith [42] and Weil [46], as 
well as the classics Euclid [11], Gauss [13] and Dirichlet [6]. Gauss’s masterpiece is 
quoted here and in the text as ‘D.A.’ 

The reader is warned that, besides its use in §1, the word ‘lattice’ also has quite a 
different mathematical meaning, which will be encountered in Chapter VII. 

The basic theory of divisibility is discussed more thoroughly than in the usual texts 
by Stieltjes [43]. For Proposition 6, see Prufer [35]. In the theory of groups, Schreier’s 
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refinement theorem and the Jordan—Holder theorem may be viewed as generalizations 
of Propositions 6 and 7. These theorems are stated and proved in Chapter I, §3 of 
Lang [23]. The fundamental theorem of arithmetic (Proposition 7) is usually attributed 
to Gauss (D.A., §16). However, it is really contained in Euclid’s Elements (Book VU, 
Proposition 31 and Book IX, Proposition 14), except for the appropriate terminology. 
Perhaps this is why Euler and his contemporaries simply assumed it without proof. 

Generalizations of the fundamental theorem of arithmetic to other algebraic struc- 
tures are discussed in Chap. 2 of Jacobson [21]. For factorial domains, see Samuel [39]. 

Our discussion of the fundamental theorem did not deal with the practical problems 
of deciding if a given integer is prime or composite and, in the latter case, of obtaining 
its factorization into primes. Evidently if the integer a is composite, its least prime 
factor p satisfies p” < a. In former days one used this observation in conjunction with 
tables, such as [24], [25], [26]. With new methods and supercomputers, the primal- 
ity of integers with hundreds of digits can now be determined without difficulty. The 
progress in this area may be traced through the survey articles [48], [7] and [27]. Fac- 
torization remains a more difficult problem, and this difficulty has found an important 
application in public-key cryptography; see Rivest et al. [37]. 

For Proposition 12, cf. Hillman and Hoggatt [17]. A proof that the ring of all al- 
gebraic integers is a Bézout domain is given on p. 86 of Mann [31]. The ring of all 
functions which are holomorphic in a given region was shown to be a Bézout domain 
by Wedderburn (1915); see Narasimhan [32]. 

For Gauss’s version of Proposition 17, see D.A., §42. It is natural to ask if Corol- 
lary 18 remains valid if the polynomial ring R[t] is replaced by the ring R[[t]] of 
formal power series. The ring K[[t,...,tm]] of all formal power series in finitely 
many indeterminates with coefficients from an arbitrary field K is indeed a factorial 
domain. However, if R is a factorial domain, the integral domain R[[f]] of all formal 
power series in ¢ with coefficients from R need not be factorial. For an example in 
which R is actually a complete local ring, see Salmon [38]. 

For generalizations of Eisenstein’s irreducibility criterion (Proposition 19), see 
Gao [12]. Proposition 21 is proved in Rhai [36]. Euclidean domains are studied further 
in Samuel [40]. Quadratic fields Q(./d) whose ring of integers Gy is Euclidean are 
discussed in Clark [3], Dubois and Steger [8] and Eggleton et al. [9]. 

Congruences are discussed in all the books on number theory cited above. In con- 
nection with Lemma 32 we mention a result of Frobenius (1895). Frobenius proved 
that if G is a finite group of order n and if d is a positive divisor of n, then the number 
of elements of G whose order divides d is a multiple of d. He conjectured that if the 
number is exactly d, then these elements form a (normal) subgroup of G. The conjec- 
ture can be reduced to the case where G is simple, since a counterexample of minimal 
order must be a noncyclic simple group. By appealing to the recent classification of all 
finite simple groups (see Chapter V, §7), the proof of the conjecture was completed by 
Tiyori and Yamaki [20]. 

There is a table of primitive roots on pp. 52-56 of Hua [18]. For more extensive 
tables, see Western and Miller [47]. 

It is easily seen that an even square is never a primitive root, that an odd square 
(including 1) is a primitive root only for the prime p = 2, and that —1 is a primitive 
root only for the primes p = 2, 3. Artin (1927) conjectured that if the integer a is not 
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a square or —1, then it is a primitive root for infinitely many primes p. (A quantitative 
form of the conjecture is considered in Chapter IX.) If the conjecture is not true, then 
it is almost true, since it has been shown by Heath-Brown [16] that there are at most 
3 square-free positive integers a for which it fails. 

A finite subgroup of the multiplicative group of a division ring need not be cyclic. 
For example, if H is the division ring of Hamilton’s quaternions, HI contains the 
non-cyclic subgroup {+1, +7, +/, +k} of order 8. All possible finite subgroups of the 
multiplicative group of a division ring have been determined (with the aid of class field 
theory) by Amitsur [2]. 

For Carmichael numbers, see Alford et al. [1]. 

Galois (1830) showed that there were other finite fields besides F, and indeed, as 
Moore (1893) later proved, he found them all. Finite fields have the following basic 
properties: 


(i) The number of elements in a finite field is a prime power p”, where n € N and 
the prime p is the characteristic of the field. 

(ii) For any prime power gq = p”, there is a finite field F, containing exactly q 
elements. Moreover the field F, is unique, up to isomorphism, and is the 
splitting field of the polynomial r? — t over Fp. 

(iii) For any finite field Fj, the multiplicative group ie of nonzero elements is cyclic. 

(iv) If¢q = p”, the map o: a > a? is an automorphism of F, and the distinct auto- 
morphisms of F, are the powers ok(k =0,1,...,n—1}. 


The theorem of Chevalley and Warning (Proposition 34) extends immediately to 
arbitrary finite fields. Proofs and more detailed information on finite fields may be 
found in Lidl and Niederreiter [30] and in Joly [22]. 

A celebrated theorem of Wedderburn (1905) states that any finite division ring is 
a field, i.e. the commutative law of multiplication is a consequence of the other field 
axioms if the number of elements is finite. Here is a purely algebraic proof. 

Assume there exists a finite division ring which is not a field and let D be one 
of minimum cardinality. Let C be the centre of D and a € D\C. The set M of all 
elements of D which commute with a is a field, since it is a division ring but not the 
whole of D. Evidently M is a maximal subfield of D which contains a. If [D :C] =n 
and [M : C] = m then, by Proposition I.32, [D : M] = m andn = m~. Thus m is 
independent of a. 

If C has cardinality g, then D has cardinality g”, M has cardinality q’” and the 
number of conjugates of a in D is (q” — 1)/(q — 1). Since this holds for every a € 
D\C, the partition of the multiplicative group of D into conjugacy classes shows that 


q’ -1=q-—1+rq@"—-)/@"-}) 
for some positive integer r. Hence g — | is divisible by 
(q” _ 1)/(q™ -l= Lee ee Saget). 


Since n > m > 1, this is a contradiction. 
For the history of the Chinese remainder theorem (not only in China), see 
Libbrecht [29]. 
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We have developed the arithmetic of quaternions only as far as is needed to prove 


the four-squares theorem. A fuller account was given in the original (1896) paper 
of Hurwitz [19]. For more information about sums of squares, see Grosswald [14] 
and also Chapter XIII. For Waring’s problem, see Waring [45], Ellison [10] and 
Vaughan [44]. 
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More on Divisibility 


In this chapter the theory of divisibility is developed further. The various sections of the 
chapter are to a large extent independent. We consider in turn the law of quadratic reci- 
procity, quadratic fields, multiplicative functions, and linear Diophantine equations. 


1 The Law of Quadratic Reciprocity 


Let p be an odd prime. An integer a which is not divisible by p is said to be a quadratic 
residue, or quadratic nonresidue, of p according as the congruence 


x? = amod P 
has, or has not, a solution x. We will speak of the guadratic nature of a mod p, 
meaning whether a is a quadratic residue or nonresidue of p. 

Let g be an odd prime different from p. The law of quadratic reciprocity connects 
the quadratic nature of g mod p with the quadratic nature of p mod q. It states that if 
either p or q is congruent to | mod 4, then the quadratic nature of g mod p is the same 
as the quadratic nature of p mod q, but if both p and q are congruent to 3 mod 4 then 
the quadratic nature of g mod p is different from the quadratic nature of p mod q. 

This remarkable result plays a key role in the arithmetic theory of quadratic forms. 
It was discovered empirically by Euler (1783). Legendre (1785) gave a partial proof 
and later (1798) introduced the convenient ‘Legendre symbol’. The first complete 
proofs were given by Gauss (1801) in his Disquisitiones Arithmeticae. Indeed the re- 
sult so fascinated Gauss that during the course of his lifetime he gave eight proofs, four 
of them resting on completely different principles: an induction argument, the theory 
of binary quadratic forms, properties of sums of roots of unity, and a combinatorial 
lemma. The proof we are now going to give is also of a combinatorial nature. Its idea 
originated with Zolotareff (1872), but our treatment is based on Rousseau (1994). 

Let n be a positive integer and let X be the set {0,1,...,n — 1}. As in 87 of 
Chapter I, a permutation a of X is said to be even or odd according as the total number 
of inversions of order it induces is even or odd. If a is an integer relatively prime to n, 
then the map z,: X > X defined by 


Tq(x) = ax modn 


W.A. Coppel, Number Theory: An Introduction to Mathematics, Universitext, 129 
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is a permutation of X. We define the Jacobi symbol (a/n) to be sgn(zzq), i.e. 
(a/n) =1lor -1 


according as the permutation z, is even or odd. Thus (a/1) = 1, for every integer 
a. (The definition is sometimes extended by putting (a/n) = 0 if a and n are not 
relatively prime.) 


Proposition 1 For any positive integer n and any integers a,b relatively prime to n, 
the Jacobi symbol has the following properties: 


G) U/n)=1, 
(ii) (a/n) = (b/n) ifa = bmodn, 
(iii) (ab/n) = (a/n)(b/n), 
(iv) (—-1/n) = 1 ifn = 1 or 2mod4 and = —1 ifn = 3 or Omod4. 


Proof The first two properties follow at once from the definition of the Jacobi sym- 
bol. If a and b are both relatively prime to n, then so also is their product ab. Since 
Tab = TaXp, we have sgn(zqp) = sgn(zq)sgn(zp), which implies (iii). We now eval- 
uate (—1/n). Since the map z_1;: x — —x modn fixes 0 and reverses the order of 
1,...,n — 1, the total number of inversions of order is (n — 2) + (n —3)+---+1= 
(n — 1)(n — 2)/2. It follows that (—1/n) = (—1)"—)/? or (-1)-”/? according as 
n is odd or even. This proves (iv). 


Proposition 2 For any relatively prime positive integers m, n, 


(i) if m and n are both odd, then (m/n)(n/m) = (—1)™"~D@-D/4; 
(ii) if m is odd and n even, then (m/n) = 1 or (—1)"~)/? according as n = 2 or 
0 mod 4. 


Proof The cyclic permutation t: x — x + 1 modn of the set X = {0,1,...,2— 1} 
has sign (1-4, since the number of inversions of order is n — 1. Hence, for any 
integer b > O and any integer a relatively prime to n, the linear permutation 


tna: x > ax +bmodn 
of X has sign(—1)?"-(a/n). 
Put Y = {0,1,...,m— 1} and P = X x Y. We consider two transformations yw 
and v of P, defined by 
(x,y) = (mx +ymodn,y), v(x, y) = (x +ny modm). 


For each fixed y, w defines a permutation of the set (X, y) with sign (—1)”"-)(m/n). 
Since paar y =m/(m — 1)/2, it follows that the permutation w of P has sign 


sgn() _ 1 Gan”. 
Similarly the permutation v of P has sign 


sgn(v) = 1 ee amy: 
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and hence @ := vu! has sign 
sgn(a) - (—1) "9 —-DO-D2 Gn /ny™ n/m)". 


But a is the permutation (mx + ymodn, y) > (x, x +ny modm) and its sign can be 
determined directly in the following way. 

Put Z = {0,1,...,mn — 1}. By Proposition II.36, for any (x, y) € P there is a 
unique z € Z such that 


zZ=xmodn, z=ymodm. 


Moreover, any z € Z is obtained in this way from a unique (x, y) € P. For any 
z € Z, we will denote by p(z) the corresponding element of P. Then the permutation 
a can be written in the form p(mx + y) — p(x + ny). Since p is a bijective map, 
the sign of the permutation a of P will be the same as the sign of the permutation 
B = p ‘ap: mx + y > x +ny of Z. An inversion of order for 8 occurs when both 
mx +y > mx'+y' andx+ny < x’! +ny’, ie. when both m(x — x’) > y’ — y and 
x —x' < n(y’ — y). But these inequalities imply mn(x — x’) > x — x’ and hence 
x >x', y’ > y. Conversely, if x > x’, y’ > y, then 


m(x—x)>m>y'—-y, n&’-y)>n>x-x’'. 
Since the number of (x, y), (x’, y’) € P withx > x’, y < y’ism(m—1)/2-n(n—1)/2, 


it follows that the sign of the permutation a is (—1)”"""-D@—-)/4, Comparing this 
expression with the expression previously found, we obtain 


(m/n)” (n/m)" = (aye eae. 


This simplifies to the first statement of the proposition if m and n are both odd, and to 
the second statement if m is odd and n even. 


Corollary 3 For any odd positive integer n, (2/n) = 1 or —1 according asn = +1 
or +5 mod 8. 


Proof Since the result is already known for n = 1, we suppose n > 1. Then either n 
or n — 2 is congruent to | mod 4 and so, by Proposition | and Proposition 2(i), 


(2/n) = (-1/n)((n — 2)/n) = (-1/n)(n/(n — 2)) = I)" P27 2/( = 2)). 


Iterating, we obtain (2/n) = (—1)", where h = (n — 1)/2+ (n —3)/24+---+1= 
(n? — 1)/8. The result follows. 


The value of (a/n) when n is even is completely determined by Propositions 1 
and 2. The evaluation of (a/n) when n is odd reduces by these propositions and Corol- 
lary 3 to the evaluation of (m/n) for odd m > 1. Although Proposition 2 does not 
provide a formula for the Jacobi symbol in this case, it does provide a method for its 
rapid evaluation, as we now show. 

Ifm and n are relatively prime odd positive integers, we can write m = 2hn+¢\n1, 
where h € Z, € = +1 and n, is an odd positive integer less than n. Then n and n, are 
also relatively prime and 


(m/n) = (€1/n)(m/n). 
If, = 1, we are finished. Otherwise, using Proposition 2(i), we obtain 


(m/n) = (-1I)™-VO—D/4(e1 /n)(n/ny) = £(0/n1), 
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where the minus sign holds if and only if m and ¢1n, are both congruent to 3 mod 4. 
The process can now be repeated with m, n replaced by n, ny. After finitely many steps 
the process must terminate with ns = 1. 


As an example, 
2985 —1 917 1951 
(issr) = (ros) issx) =~(srr) 

117 O17 
=-(ai7) =-(im) 

-1 19 117 
=-(i) (tz) =-Ga) 
-(3)-(2)-Q)- 

19 3 3 : 


Further properties of the Jacobi symbol can be derived from those already estab- 
lished. 


Proposition 4 [f n,n’ are positive integers and if a is an integer relatively prime ton 
such that n' =n mod 4a, then (a/n') = (a/n). 


Proof If a = —1 then, since n’ = nmod4, (a/n’) = (a/n), by Proposition 1 (iv). If 
a = 2 then, since n and n’ are odd and n’ = n mod 8, (a/n’) = (a/n), by Corollary 3. 
Consequently, by Proposition 1(ii1), it is sufficient to prove the result for odda > 1. 
If n is even, the result now follows from Proposition 2(ii). If 1 is odd, it follows 
from Proposition 2(i) and Proposition 1. 


Proposition 5 [f the integer a is relatively prime to the odd positive integers n and n’, 
then (a/nn’) = (a/n)(a/n’). 


Proof We have a = a’ modnn’ for some a’ € {1,2,...,nn'}. Since nn’ is odd, we 
can choose j € {0, 1, 2,3} so that a” = a’ + jnn’ satisfies a” = 1 mod4. Then, by 
Propositions | and 2, 


(a/nn') = (a Jn!) = (nn'/a") = (n/a"")(n /a"") = (a"[n)(a"/n!) = (a/ny(a/n). 


Proposition 5 reduces the evaluation of (a/n) for odd positive n to the evaluation of 
(a/p), where p is an odd prime. This is where we make the connection with quadratic 
residues: 


Proposition 6 [f p is an odd prime and a an integer not divisible by p, then (a/p) = 1 
or —1 according as a is a quadratic residue or nonresidue of p. Moreover, exactly half 
of the integers 1,..., p — 1 are quadratic residues of p. 


Proof If ais a quadratic residue of p, there exists an integer x such that x = amod p 
and hence 


(a/p) = (x?/p) = (x/p)(x/p) = 1. 
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Let g be a primitive root mod p. Then the integers 1, g,..., g?~* mod p are just a 
rearrangement of the integers 1, 2,..., p — 1. The permutation 


Tg: X — gx (mod p) 


fixes 0 and cyclically permutes the remaining elements I, g,..., g?~*. Since the num- 
ber of inversions of order is p — 2, it follows that (g/p) = —1. For any integer a not 
divisible by p there is a unique k € {0,1,..., p — 2} such that a = g* mod p. Hence 


(a/p) = (g*/p) = (g/p)* = (-1. 


Thus (a/p) = 1 if and only if k is even and then a = x” mod p with x = g*/?. 

This proves the first statement of the proposition. Since exactly half the integers in 
the set {0, 1,..., p—2} are even, it also proves again (cf. Proposition II.28) the second 
statement. 


The law of quadratic reciprocity can now be established without difficulty: 


Theorem 7 Let p and q be distinct odd primes. Then the quadratic natures of p mod q 
and q mod p are the same if p = | org = 1 mod4, but different if p = q = 3 mod4. 


Proof The result follows at once from Proposition 6 since, by Proposition 2(1), 
if either p = 1 or g = | mod4 then (p/q) = (q/p), but if p = gq = 3 mod4 then 
(p/q) = —(4/P)- 


Legendre (1798) defined (a/p) = 1 or —1 according as a was a quadratic residue 
or nonresidue of p, and Jacobi (1837) extended this definition to (a/n) for any odd 
positive integer n relatively prime to a by setting 


(a/n) = | [@/p). 


p 


where p runs through the prime divisors of n, each occurring as often as its multi- 
plicity. Propositions 5 and 6 show that these definitions of Legendre and Jacobi are 
equivalent to the definition adopted here. The relations (—1/p) = (—1)~)/? and 
(2/p) = (-1)?°-D/8 for odd primes p are often called the first and second supple- 
ments to the law of quadratic reciprocity. 

It should be noted that, if the congruence x* = a modzn is soluble then (a/n) = 1, 
but the converse need not hold when n is not prime. For example, ifn = 21 and 
a = 5 then the congruence x* = 5mod21 is insoluble, since both the congruences 
x? =5mod3 and x? = 5 mod7 are insoluble, but 


P) 5\ (5 
—)=(=)}(=)=Cv*=1. 
(a) =()G)-— 
The Jacobi symbol finds an interesting application in the proof of the following 


result: 


Proposition 8 [fa is an integer which is not a perfect square, then there exist infinitely 
many primes p not dividing a for which (a/p) = —1. 
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Proof Suppose first that a = —1. Since (—1/p) = (—1)'?—)/?, we wish to show 
that there are infinitely many primes p = 3mod4. Clearly 7 is such a prime. Let 
{P1,---» Pm} be any finite set of such primes greater than 3. Adapting Euclid’s proof 
of the infinity of primes (which is reproduced at the beginning of Chapter IX), we put 


b=4pi-+: Pm +3. 


Then b is odd, but not divisible by 3 or by any of the primes pj,..., Pm. Since 
b = 3mod4, at least one prime divisor g of b must satisfy gq = 3 mod4. Thus the 
set {3, P1,.-.-., Pm} does not contain all primes p = 3 mod 4. 

Suppose next that a = +2. Then (a/5) = —1. Let {p1,..., pm} be any finite set 
of primes greater than 3 such that (a/p;) = —1 (i = 1,...,m) and put 


b= 8pi-+: Pm +3, 


where the + sign is chosen according as a = +2. Then b is not divisible by 3 or 
by any of the primes pj,..., Pm. Since b = +3 mod 8, we have (2/b) = —1 and 
(a/b) = —1 in both cases. If b = q1--- qn is the representation of b as a product of 
primes (repetitions allowed), then 


(a/b) = (a/q1) +: (4/4n) 


and hence (a/q;) = —1 for at least one j. Consequently the result holds also in this 
case. 

Consider now the general case. We may assume that a is square-free, since if 
a = a'b’, where a’ is square-free, then (a/p) = (a’/p) for every prime p not 
dividing a. Thus we can write 


a= 62°r, +++ Th, 


where ¢ = +1,e = Oorl, and7,...,7, are distinct odd primes. By what we have 
already proved, we may assume h > 1. 

Let {p1,..., Pm} be any finite set of odd primes not containing any of the primes 
r1,...,1h. By Proposition 6, there exists an integer c such that (c/r1) = —1. Since the 
moduli are relatively prime in pairs, by Corollary II.38 the simultaneous congruences 


x=I1mod8, x=I1modp;(i =1,...,m), 


x=cmodrj, x =1modrj(j =2,...,h), 


have a positive solution x = b. Then b is not divisible by any of the odd primes 
Pis+++> Pm OF r,..-,%h. Moreover (—1/b) = (2/b) = 1, since b = 1 mod8. Since 
(r;/b) = (b/r;) for 1 < j < h, it follows that 


(a/b) = (e/b)(2/b)°(r1/b) «+ (Tn/b) 
= (b/r1)(b/r2) ++ + (6/rn) = (c/n) C/r2) +++ /ra) = —1. 


As in the special case previously considered, this implies that (a/q) = —1 for some 
prime gq dividing b, and the result follows. 
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A second proof of the law of quadratic reciprocity will now be given. Let p be an 
odd prime and, for any integer a not divisible by p, with Legendre define 


(a/p)=1 or —-1 


according as a is a quadratic residue or quadratic nonresidue of p. It follows from 
Euler’s criterion (Proposition II.28) that 


(ab/p) = (a/p)(b/P) 
for any integers a, b not divisible by p. Also, by Corollary II.29, 
C1/p) = ner”. 


Now let g be an odd prime distinct from p and let K = F, be the finite field con- 
taining g elements. Since p # q, the polynomial t? — 1 has no repeated factors in K 
and thus has p distinct roots in some field L > K. If ¢ is any root other than 1, then 
the (cyclotomic) polynomial 


f(t)= pp) 4 pp a 


has the roots ¢*(k = 1,..., p— 1). 
Consider the Gauss sum 


p-1 


t= > Ga. 
x=1 


Instead of summing from | to p — 1, we can just as well sum over any set of represen- 
tatives of FF: 


c= S Gye. 
x40 mod p 


Since q is odd, (x/p)4 = (x/p) and hence, since L has characteristic q, 


f= > Gye 


x40 mod p 


If we put y = xq then, since 


(x/p) = (q°x/p) = (ay/p) = (4/P)(y/P), 


we obtain 


= SS) G/P)/P)e” = @/p)e- 


y¥40 mod p 


Furthermore, 


r= > @Gpoprer= > wre’ 


u,v40 mod p u,v¥40 mod p 
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or, putting v = uw, 


2. _ > (w/p) a, eee) 


w¥0 mod p u¥40 mod p 


Sy 


Since the coefficients of t?~! and t?~? in f(t) are 1, the sum of the roots is —1 and thus 


>) ¢ = -1 ifa#O0modp. 


u¥40 mod p 


On the other hand, if a = 0 mod p, then ¢““ = 1 and 


2 p= p= 1, 


uO mod p 
Hence 
=(-1/p(p-)- >) (w/p)=(-1/p)p— >) (w/p). 
w¥0,—1 mod p w 0 mod p 


Since there are equally many quadratic residues and quadratic nonresidues, the last 
sum vanishes and we obtain finally 


2 = (-1)?-D? p, 
Thus t ¥ 0 and from the previous expression for 74 we now obtain 


r7-! = (q/p). 
But 


197! = (22)G-D/2 = (0D py G-D 


and p@—!)/ — (p/q), by Proposition II.28 again. Hence 


(q/p) = (-Y)P-VE-VA(p/q), 


which is the law of quadratic reciprocity. 
The preceding proof is a variant of the sixth proof of Gauss (1818). Already in 
1801 Gauss had shown that if p is an odd prime, then 


p-| 
S, eumik’ /p +./p or +i./p according as p = 1 or p = 3 mod4. 
k=0 


After four more years of labour he managed to show that in fact the + signs must be 

taken. From this result he obtained his fourth proof of the law of quadratic reciprocity. 

The sixth proof avoided this sign determination, but Gauss’s result is of interest in it- 

self. Dirichlet (1835) derived it by a powerful analytic method, which is readily gener- 

alized. Although we will make no later use of it, we now present Dirichlet’s argument. 
For any positive integers m,n, we define the Gauss sum G(m, n) by 
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n—1 


G(m,n) = yer, 


v=0 


Instead of summing from 0 to n — | we can just as well sum over any complete set of 
representatives of the integers modn: 


0) 
G(m, n) = > ecriv m/n 
vmodn 


Gauss sums have a useful multiplicative property: 


Proposition 9 [fm, n,n’ are positive integers, with n and n’ relatively prime, then 
G(mn',n)G(mn, n') = G(m, nn’). 


Proof When v and v’ run through complete sets of representatives of the integers mod 
nand mod n’ respectively, “ = vn’+v’n runs through a complete set of representatives 
of the integers mod nn’. Moreover 


um = (vn! +.'n)?m = (vn? +. v2n?)m modnn’. 


It follows that 


G(mn’, n)G(mn, n’) — = > e2ti(mn? y+ mn? v2) /nnl 


vmod n v’ mod n/ 


ed y 
— > errinem/nn’ _ G(m, nn’). 


“mod nn’ 


A deeper result is the following reciprocity formula, due to Schaar (1848): 
Proposition 10 For any positive integers m, n, 
2m-1 . 
m,n) = etiwin ue 
G(m,n) = Ze che 
where C = (1 +i)/2. 


Proof Let f: R — C be a function which is continuously differentiable when 
restricted to the interval [0, n] and which vanishes outside this interval. Since the sum 


FQ = >) ftt+h 
k=—oo 


has only finitely many nonzero terms, the function F has period | and is continuously 
differentiable, except possibly for jump discontinuities when ¢ is an integer. Therefore, 
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by Dirichlet’s convergence criterion in the theory of Fourier series, 


N 1 
— —2nriht 
{F(+0) + F(-0)}/2 = im Pa [ é F(t)dt. 
But 


1 _ 1 
| et Pindt = eS | grrr TG he k)dt 
0 fa== 66 0 


(oe) 


k+1 : n ; 
= >. | eo 2aiht f(t)dt — | grt Pat. 
c 0 


k=—00o 


Thus we obtain 
N n . 
FOND FM +--+ FO- D+ FO) jim, Ye [em poa. 


This is a simple form of Poisson’s summation formula (which makes an appearance 
also in Chapters IX and X). 

In particular, if we take f(t) = e2nitm/n (0 < t < n), where m is also a positive 
integer, then the left side of («) is just the Gauss sum G(m, 1). We will now evaluate 
the right side of (*) for this case. Put h = 2mq + mu, where g and w are integers and 
0 < uw < 2m. Then 


got Fe) -_ e2tim(t—ng)?/n 9 2mipt 
As hf runs through all the integers, g does also and yw runs independently through the 
integers 0,..., 2m — 1. Hence 


2m—-1 


li —2niht tydt = din wh? 2nim(t— nq) /n eT F It dy 
sim, Lf far = im, x 
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(ee) 2 
c = eon dt. 
—Co 


(This is an important example of an infinite integral which converges, although the 
integrand does not tend to zero.) From (*) we now obtain the formula for G(m, n) in 
the statement of the proposition. To determine the value of the constant C, take m = 1, 
n = 3. We obtain i/3 = /3C(1 +), which simplifies to C = (1 + i)/2. 


where C is the Fresnel integral 


From Proposition 10 with m = | we obtain 


(l+i)/n ifn =0 (mod4), 


G(l,n) = Seta _|vn ifn = 1 (mod 4), 
v=0 0 ifn = 2 (mod 4), 
iJn if n = 3 (mod 4). 


If m and n are both odd, it follows that 


G(1,mn) = G(1,m)G(,n)_ if either m = 1 orn = 1 mod4, 
=-G(1,m)G(,n) ifm=n=3mod4; 


1.e. 
G(1, mn) = (-1)"-D@-YD/4G (1, m)G(1, n). 


If, in addition, m and n are relatively prime, then G(m,n) G(n,m) = G(1, mn), by 
Proposition 9. Hence, if the integers m, n are odd, positive and relatively prime, then 


G(m, n)G(n, m) = (-1)"—D-DAG(L, m)G(1, n). 
For any odd, positive relatively prime integers m, n, put 
p(m,n) = G(m,n)/G(,n). 
Then 


p(l,n) = 1, 
p(m,n) = p(m',n) ifm =m’ modn, 
p(m, n)p(n, m) = CP ae oe 
We claim that p(m, n) is just the Jacobi symbol (m/n). This is evident if m = 1 and, 
by Proposition 2(1), if p(m,n) = (m/n), then also p(n, m) = (n/m). 


Hence if the claim is not true for all m, n, there is a pair m,n with 1 < m <n such 
that 


p(m,n) # (m/n), 


but p(w, v) = (u/v) for all odd, positive relatively prime integers w,v with “ < m. 
We can write n = km +r for some positive integers k, r with r < m. 
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Then 


p(n,m) = p(r,m) = (r/m) = (n/m). 


Since p(m,n) # (m/n), this yields a contradiction. Thus, if n is an odd positive 
integer, 


G(m,n) = (m/n)G(1,n) 
for any odd positive integer m relatively prime to n. 


In fact this relation holds also if m is negative, since 
GC, n) = (-))""-P?G(,n) and G(—m,n) = G(m, n). 


(It may be shown that the relation holds also if m is even.) As we have already obtained 
an explicit formula for G(1, ), we now have also an explicit evaluation of G(m, n). 


2 Quadratic Fields 


Let ¢ be a complex number which is not rational, but whose square is rational. Since 
¢ ¢ Q, acomplex number a has at most one representation of the form a = r+ s¢, 
where r,s € Q. Let Q(¢) denote the set of all complex numbers a which have a 
representation of this form. Then Q(C) is a field, since it is closed under subtraction 
and multiplication and since, if r and s are not both zero, 


(r+scy! =(r-so)/(r? — 8°¢?). 


Evidently Q(¢) = Q(t¢) for any nonzero rational number f. Conversely, if 
Q() = Qe*), then ¢* = t¢ for some nonzero rational number ¢t. For ¢* = r+ s¢, 
where r,s € Q ands + 0, and hence 


r2 = ce _ are" Sw Aer 
Thus ¢¢* is rational, and so is ¢¢*/¢? = ¢*/C. 

It follows that without loss of generality we may assume that ¢* = d is a square- 
free integer. Then d? € Z for some t € Q implies rt € Z. If cr? = d* is alsoa 
square-free integer, then Q(¢) = Q(¢*) if and only if d = d* and ¢* = +¢. 

The quadratic field Q(/d) is said to be real if d > 0 and imaginary if d < 0. We 
define the conjugate of an element a = r + sJ/d of the quadratic field Q(./d) to be 
the element a’ = r — sJ/d. It is easily verified that 


(a+ fy =a'+f', (aBy =a'f". 


Since the map a : a — a’ is also bijective, it is an automorphism of the field Q(/d). 
Since a’ = a if and only if s = 0, the rational field Q is the fixed point set of o. Since 
(a’)’ = a, the automorphism co is an ‘involution’. 

We define the norm of an element a = r + sd of the quadratic field Q(V/d) to 
be the rational number 


2 Quadratic Fields 141 
N(a) = aa!’ =r* — ds’. 


Evidently N(a@) = N(a’), and N(a) = 0 if and only if a@ = 0. From the relation 
(aB) = a' Bf’ we obtain 


N(af) = N(@)N(B). 


An element a of the quadratic field Q(/d) is said to be an integer of this field if it 
is a root of a quadratic polynomial t? + at +b with coefficients a, b € Z. (Equivalently, 
the integers of Q(./d) are the elements which are algebraic integers.) 

It follows from Proposition II.16 that a € Q is an integer of the field Q(Vd) if and 
only if a € Z. Suppose now that a = r + sVd, where r,s € Qand s ¥ 0. Then a is 
a root of the quadratic polynomial 


f(x) =(-a)\(x-a’')= x? — 2rx +r? — ds*. 


Moreover, this is the unique monic quadratic polynomial with rational coefficients 
which has @ as a root. 

Consequently, if a is an integer of Q(/d), then so also is its conjugate a’ and its 
norm N(a) = r* — ds? is an ordinary integer. 


Proposition 11 Let d be a square-free integer and define w by 


o= Vd if d = 2 or3mod4, 
=(/d—-1)/2 ifd = 1mod4. 


Then a is an integer of the quadratic field Q(/d) if and only if a = a + bo for 
some a,b € Z. 


Proof Suppose a =r + sd, where r,s € Q. As we have seen, if s = 0 then a is an 
integer of Q(V/d) if and only ifr € Z. If s ¥ 0, then a is an integer of Q(/d) if and 
only if a = 2r and b = r* — ds” are ordinary integers. If a is even, i.e. if r € Z, then 
b € Z if and only if ds” € Z and hence, since d is square-free, if and only if s € Z. 
If a is odd, then a? = 1 mod4 and hence b € Z if and only if 4ds* = 1 mod 4. Since 
d is square-free, this implies that 2s € Z, s ¢ Z. Hence 2s is odd and d = | mod4. 
Conversely, if 2r and 2s are odd integers and d = 1 mod4, then r* — ds* € Z. The 
result follows. 


Since w@* = —w + (d — 1)/4 in the case d = 1 mod4, it follows directly from 
Proposition 11 that the set Gy of all integers of the field Q(/d) is closed under sub- 
traction and multiplication and consequently is a ring. In fact Gy is an integral domain, 
since Oy C Q(V/d). 

For example, G_; = is the ring of Gaussian integers a + bi, where a,b € Z. 
They form a square ‘lattice’ in the complex plane. Similarly G_3 = @ is the ring of all 
complex numbers a + bp, where a, b € Zand p = (iV3— 1)/2 is acube root of unity. 
These Eisenstein integers were studied by Eisenstein (1844). They form a hexagonal 
‘lattice’ in the complex plane. 
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We have already seen in $6 of Chapter II that the ring ¥ of Gaussian integers is 
a Euclidean domain, with 6(a) = N(a). We now show that the ring & of Eisenstein 
integers is also a Euclidean domain, with d(a) = N(a). Ifa, f € € anda ¥ 0, then 


Ba! = Ba'/aa' =r +sp, 
where r, s € Q. Choose a, b € Z so that 
Ip —al < 1/2, |s—bl < 1/2. 
If« =a+bp, thenx € & and 
N(Baq! — x) = {r —a—(s —b)/2} + 3{(s — b)/2}° 
< (3/4)? + 3(1/4)? = 3/4 <1. 


Thus 6 — xa € & and N(B — xa) < N(a). 
Since Y and & are Euclidean domains, the divisibility theory of Chapter II is valid 
for them. As an application, we prove 


Proposition 12 The equation x* + y? = 23 has no solutions in nonzero integers. 


Proof Assume on the contrary that such a solution exists and choose one for which 
|xyz| is a minimum. Then (x, y) = (x, z) = (y, z) = 1. If3 did not divide xyz, then 
x3, y3 and z? would be congruent to +1 mod9, which contradicts xet y3 =z), So, 
without loss of generality, we may assume that 3|z. Then x? + y*? = Omod3 and, 
again without loss of generality, we may assume that x = 1 mod3, y = —1mod3. 
This implies that 


x = ey y* = 3mod9. 


If x + y and x? — xy + y* have a common prime divisor p, then p divides 3xy, since 
3xy = (x + y)* — (x* — xy 4+ y?), and this implies p = 3, since (x, y) = 1. Since 


(x+y)? —xy ty) = x3 + y? = 23 =0mod27, 


[ff / 
PJ 7 J 
[ff / 


G=@ ,: Gaussian integers é= 0 ; Eisenstein integers 


Fig. 1. Gaussian and Eisenstein integers. 
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it follows that 


x+y =9a’, 


where a,b € Zand 3 { b. 
We now shift operations to the Euclidean domain & of Eisenstein integers. We have 


x? —xy +y* = (x + yp)(x + yp”), 


where p = (iV3 — 1)/2 is a cube root of unity. Put 2 = 1 — p, so that (1 + p)A? = 3. 
Then / is a common divisor of x + yp and x + yp”, since 


x+yp=xt+y-—ya, 
xtyp?=x-—2yt+yi 


and x + y = 0 = x — 2ymod3. In fact / is the greatest common divisor of x + yp 
and x + yp” since, for all m,n € Z, 


(m+n +np)(x + yp?) — (n+ mp + np)(x + yp) = (mx +ny)A 


and we can choose m,n so that mx + ny = 1. Since 27 = —3p and since p is a unit, 
from (x + yp)(x + yp”) = 3b? and the unique factorization of b in &, we now obtain 


x+yp=ed(c+dp)’, 
where c,d € Zand « is a unit. From 
(x+ yp)/A=x-Awt+y)/3=x- 3a>h 


and 


(c+dp) =? — 3cd* +d? + 3cd(c — d)p, 
by reducing mod 3 we get 
e(c? +d) = 1 mod3. 


Since the units in & are +1, +p, +p? (by the following Proposition 13), this implies 
é = +1. In fact we may suppose ¢ = 1, by changing the signs of c and d. Equating 
coefficients of p, we now get 


a’ =cd(c —d). 


But (c, d) = 1, since (x, y) = 1, and hence also (c, c—d) = (d, c—d) = 1. It follows 
that c = nse — yp c-d= x? for some x1, yj, z1 € Z. Thus x + y; — a and 


Ixiyizil = la] = |z/3b| < [xyz]. 


But this contradicts the definition of x, y, z. 
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The proof of Proposition 12 illustrates how problems involving ordinary integers 
may be better understood by viewing them as integers in a larger field of algebraic 
numbers. 

We now return to the study of an arbitrary quadratic field Q(/d), where d is a 
square-free integer. For convenience of writing we put J = Gy. As in Chapter II, we 
say that e € J is a unit if there exists 7 € J such that e7 = 1. For example, | and 
—1 are units. The set U of all units is evidently an abelian group under multiplication. 
Moreover, if ¢ € U, then also e’ € U. 

If ¢ is aunit, then N(e) = +1, since e7 = | implies N(e)N(y) = 1. Conversely, if 
é € J and N(e) = +1, then ¢ is a unit, since N(e) = ee’ and «’ € J. (Note, however, 
that N(a) = +1 does not imply a € J. For example, in Q(./—1), a = (3+-4i)/5 ¢€Y, 
although N(a) = 1.) It follows that, when d = 2 or 3mod4,a =a+ b/d is a unit if 
and only if a,b € Z and 


ead =A. 


On the other hand, when d = 1 mod4, a = a + b(V/d — 1)/2 is a unit if and only if 
a,b éZand 


(b — 2a)” — db? = +4. 


But if b,c € Zand c? — db? = +4, then c? = b? mod4 and hence c = b mod2. 

Consequently, the units of J are determined by the solutions of the Diophantine 
equations x? — dy* = +4 or x? — dy? = +1, according as d = 1 ord ¥ 1 mod4. 
This makes it possible to determine all units, as we now show. 


Proposition 13 The units of G_, are +1,+i and the units of G_3 are +1, 
(+1 +i/3)/2. For every other square-free integer d < 0, the only units of Gg are +1. 

For each square-free integer d > 0, there exists a unit €9 > 1 such that all units of 
Og are given by tep(n € Z). 


Proof Suppose first that d < 0. Then only the Diophantine equations with the + signs 
need to be considered. If d < —4, the only solutions of x* — dy? = 4 are y =0,x = 
+2. Ifd < —4 or if d = —2, the only solutions of x? — dy* = 1 are y = 0,x = +1. 
In these cases the only units are +1. (The group U is a cyclic group of order 2, with 
—1 as generator.) If d = —3, the only solutions of x* — dy? = 4 are y = 0, x = +2 
and y = +1, x = +1. Hence the units are +1, +p, +p”, where p = (iV3 — 1)/2. 
(The group U is a cyclic group of order 6, with —p as generator.) If d = —1, the only 
solutions of x* + y? = 1 are y = 0, x = +1 and y = +1, x = 0. Hence the units are 
+1, +7. (The group U is a cyclic group of order 4, with 7 as generator.) 

Suppose next that d > 0. With the aid of continued fractions it will be shown in §4 
of Chapter IV that the equation x” — dy* = 1 always has a solution in positive integers 
and, by doubling them, so also does the equation x* — dy* = 4. Hence there always 
exists a unit ¢ > 1. For any unite > 1 we have ¢ > +e’, since e’ = e~! or —e7!. If 
é =a+ba, where a is defined as in Proposition 11 and a, b € Z, then e’ = a— ba or 
a—b—ba, according as d # 1 ord = 1 mod 4. Since w is positive, ¢ > é’ yields b > 0 
and ¢ > —e’ then yields a > 0. Thus every unit ¢ > 1 has the form a + ba, where 
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a,b € N. Consequently there is a least unit ¢9 > 1. Then, for any unit ¢ > 1, there is 
a positive integer n such that 65 < € < en, Since cg)" isa unit and 1 < eg)” < «9, 
we must actually have ¢ = ¢5. (The group U is the direct product of the cyclic group 
of order 2 generated by —1 and the infinite cyclic group generated by ¢9.) 


As an example, take d = 2. Then ¢9 = 1+ J/2 is a unit. Since é9 > 1 and all units 
greater than | have the form a + bV/2 with a, b € N, it follows that all units are given 
by teg(n € Z). 

Having determined the units, we now consider more generally the theory of divis- 
ibility in the integral domain J. If a, 6 € J and # is a proper divisor of a, then N(f) 
is a proper divisor in Z of N(a) and hence |N(f)| < |N(a)|. Consequently the chain 
condition (#) of Chapter II is satisfied. It follows that any element of J which is nei- 
ther zero nor a unit is a product of finitely many irreducibles. Thus it only remains to 
determine the irreducibles. However, this is not such a simple matter, as the following 
examples indicate. 

The ring ¥ of Gaussian integers is a Euclidean domain. However, an ordinary 
prime p may or may not be irreducible in Y. For example, 2 = (1 + 7)(1 — i) and 
neither factor is one of the units +1, +7. On the other hand, 3 has no proper divisor 
a = a+ bi which is not a unit, since N(3) = 9 and N(a) = a? + b? = +3 has no 
solutions in integers a, b. 

Again, consider the ring @_s of integers of the field Q(./—5). An element a = 
a+b./—5 of G_s cannot have norm N(a) = a* + 5b? equal to +2 or +3, since the 
square of any ordinary integer is congruent to 0,1 or 4 mod 5. It follows that, in the 
factorizations 


6=2-3=(1+V—5)(1 — V—5), 


all four factors are irreducible and the factorizations are essentially distinct, since 
N(2) = 4, N(3) = 9 and N(1 + J—5) = 6. Thus 2 is not a ‘prime’ in @_5 and 
the ‘fundamental theorem of arithmetic’ does not hold. 

It was shown by Kummer and Dedekind in the 19th century that uniqueness of 
factorization could be restored by considering ideals instead of elements. Any nonzero 
proper ideal of Og can be represented as a product of finitely many prime ideals and 
the representation is unique except for the order of the factors. This result will now be 
established. 

A nonempty subset A of a commutative ring R is an idealifa,b€ Aandx,yeER 
imply ax + by é€ A. For example, R and {0} are ideals. If aj,...,am € R, then the 


set (a1, ..., 4m) of all elements ajxj +--+ + mXm with x; € R (1 < j < m) isan 
ideal, the ideal generated by aj, ..., am. An ideal generated by a single element is a 
principal ideal. 


If A and B are ideals in R, then the set AB of all finite sums ajb, + --- + dybn 
witha; ¢ Aandbj e BU <j <njne N) is also an ideal, the product of A and B. 
For any ideals A, B, C we have 


AB = BA, (AB)C = A(BC), 


since multiplication in R is commutative and associative. 
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An ideal A ¥ {0} is said to be divisible by an ideal B, and B is said to be a factor 
of A, if there exists an ideal C such that A = BC. For example, A is divisible by itself 
and by R, since A = AR. Thus R is an identity element for multiplication of ideals. 

Now take R = @y to be the ring of all integers of the quadratic field Q(./d). We 
will show that in this case much more can be said. 


Proposition 14 Let A # {0} be an ideal in Gg. Then there exist B, y € A such that 
every a € A can be uniquely represented in the form 


a=mp+ny (m,neZ). 


Furthermore, if w is defined as in Proposition 11, we may take B = a, y = b+ca, 
where a,b,c € Z, a > 0,c > 0, ¢ divides both a and b, and ac divides y y’, i.e. 


b* —dc? =Omodac_ ifd =2o0r3mod4, 
b(b—c)-(d- 1)c?/4 =Omodac ifd=1mod4. 


Proof Since A is an ideal, the set J of all z € Z such that y+z@ € A forsome y € Zis 
an ideal in Z. Moreover J # {0}, since A ~ {0} anda € A implies aw € A. Since Z is 
a principal ideal domain, it follows that there exists c > 0 such that J = {nc: n € Z}. 
Since c € J, there exists b € Z such that y :=b+cmwe A. 

Moreover A contains some nonzero x € Z, since a € A implies aa’ € A. Since the 
set J of allx € Z /M A is an ideal in Z, there exists a > 0 such that J = {ma: m € Z}. 
For any a = y+z@ € A we have z = nc forsomen € Zanda—ny = y—nb=ma 
for some m € Z. Thus a = mf + ny with £ = a. The representation is unique, since 
y is irrational. 

Since Bm € A, we have 


aw=ra+s(b+co) foruniquer,s € Z. 


Thus a = sc andra + sb = 0, which together imply b = —rc. Since yw € A, we 
have also 


(b+co)o=ma+n(b+co) foruniquem,n é€ Z. 
If d = 2 or 3mod4, then w? = d. In this case n = —r, cd = ma — rb and hence 


dc? = mac + b’. If d = 1 mod 4, then w@* = —w + (d — 1)/4. Hencen = —(r + 1), 
c(d — 1)/4 =ma — rb — band (d — 1)c?/4 = mac + b(b — cc). 


If A is an ideal in @y, then the set A’ = {a’: a € A} of all conjugates of elements 
of A is also an ideal in Gg. We call A’ the conjugate of A. 


Proposition 15 Jf A 4 {0} is an ideal in Gy, then AA’ =1@q for somel EN. 


Proof Choose f,y so that A = {mB + ny: m,n € Z}. Then AA’ consists of 
all integral linear combinations of £8’, By’, B’y and yy’. Furthermore r = ff’, 
s = By'+ fy andt = yy’ are all in Z. If / is the greatest common divisor of 
r,s and t, then] € AA’, by the Bézout identity, and hence 1@z C AA’. 

On the other hand, fy’ and £’y are roots of the quadratic equation 
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x*—sx+rt=0 


with integer coefficients s = By’ + f’y andrt = Bf'yy’. It follows that By’/1 and 
B’y /L are roots of the quadratic equation 


y? —(s/Dy +rt/P? =0, 


which also has integer coefficients. Since By’/1 and f’y /1 are in Q(/d), this means 
that they are in G7. Thus By’ and f’y are in/G@q. Since also Bf’ and y y’ are in 1 Oy, 
it follows that AA’ C 1G@y. 


If in the proof of Proposition 15 we choose 6 = a and y = b+ cw as in the state- 
ment of Proposition 14, then in the statement of Proposition 15 we will have / = ac. 
Since the proof of this when d = | mod 4 is similar, we give the proof only ford = 2 
or 3 mod 4. In this case @ = Jd and hence r = a”, s = 2ab, t = b? — dc?. We wish 
to show that ac is the greatest common divisor of r, s and t. Thus if we put 


a=cu,b=cv,t =acw, 


then we wish to show that u,2v and w have greatest common divisor |. Since 
uw = v* —d and d is square-free, a common divisor greater than | can only be 2. 
But if 2 were a common divisor, we would have v2 = d mod 4, which is impossible, 
because d = 2 or 3 mod 4. 

We can now show that multiplication of ideals satisfies the cancellation law: 


Proposition 16 If A, B, C are ideals in Gq with A 4 {0}, then AB = AC implies 
B=C; 


Proof By multiplying by the conjugate A’ of A we obtain AA’B = AA'C and hence, 
by Proposition 15,/B = /C for some positive integer /. But this implies B = C. 


Proposition 17 Let A and B be nonzero ideals in Gq. Then A is divisible by B if and 
only ifA C B. 


Proof If A = BC for some ideal C, then A C B, by the definition of the product of 
two ideals. 

Conversely, suppose A C B. By Proposition 15, BB’ = 1G for some positive 
integer 1. Hence AB’ C 1@q. It follows that AB’ = IC for some ideal C. Thus 
AB’ = BB'C and s0, by Proposition 16, A = BC. 


Corollary 18 Let A and B be nonzero ideals in Gg. If D is the set of all elements 
a+b, witha € Aandb é B, then D is an ideal and is a factor of both A and B. 
Moreover, every common factor of A and B is also a factor of D. 


Proof It follows at once from its definition that D is an ideal. Moreover D con- 
tains both A and B, since 0 is an element of any ideal. Evidently also any ideal 
C which contains both A and B also contains D. The result now follows from 
Proposition 17. 
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In the terminology of Chapter II, §1, this shows that any two nonzero ideals in Gq 
have a greatest common divisor. 

In a commutative ring R, an ideal A # R, {0} is said to be irreducible if its only 
factors are A and R. It is said to be maximal if the only ideals containing A are A and 
R. It is said to be prime if, whenever A divides the product of two ideals, it also divides 
at least one of the factors. 

By Proposition 17, an ideal in Gg is irreducible if and only if it is maximal. As we 
saw in §1 of Chapter II, the existence of greatest common divisors implies that an ideal 
in @q is irreducible if and only if it is prime. (These equivalences do not hold in all 
commutative rings, but they do hold for the ring of all algebraic integers in any given 
algebraic number field, and also for the rings associated with algebraic curves.) 


Proposition 19 A nonzero ideal A in Gg has only finitely many factors. 


Proof Since AA’ =1@4 for some positive integer /, any factor B of A is also a factor 
of /@q and so contains J. Proposition 14 implies, in particular, that B is generated 
by two elements, say B = (f}, £2). A fortiori, B = (f1, 2,1) and hence, for any 
Y1,V2€ Ca, also 


B= (fi —1y1, B2 — 172, 1). 
We can choose jy; € @y so that in the representation 
fi-ly=a,+b\o (a,b) € Z) 
we have 0 < a1, b; <1. Similarly we can choose y2 € @ so that in the representation 
B2 —ly2 =a7+b2@ (a2, bo € Z) 


we have 0 < az, b2 < I. It follows that there are at most I* different possibilities for 
the ideal B. 


Corollary 20 There exists no infinite sequence {Ay} of nonzero ideals in Gg such that, 
for every n, Ay+1 divides Ay and An+1 4 An. 


In the terminology of Chapter II, this shows that the set of all nonzero ideals in Gq 
satisfies the chain condition (#). Since also the conclusion of Proposition II.1 holds, 
the argument in §1 of Chapter II now shows that any nonzero proper ideal in Gy is a 
product of finitely many prime ideals and the representation is unique apart from the 
order of the factors. 

It remains to determine the prime ideals. This is accomplished by the following 
three propositions. 


Proposition 21 For each prime ideal P in Gq there is a unique prime number p such 
that P divides p@g. Furthermore, for any prime number p there is a prime ideal P in 
Gq such that exactly one of the following alternatives holds: 
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(i) pGag = PP' and P # P’; 
(ii) p0@y = P =P’; 
(iii) p@g = P* and P = P’. 


Proof If P is a prime ideal in Gg, then PP’ = 1@q for some positive integer /. 
Moreover / > 1, since / € P.If/ = mn, where m and n are positive integers 
greater than 1, then P divides either m@y or n@q. By repeating the argument it follows 
that P divides p@q for some prime divisor p of /. The prime number p is uniquely 
determined by the prime ideal P since, by the Bézout identity, if P contained distinct 
primes it would also contain their greatest common divisor 1. 

Now let p be any prime number and let the factorisation of p@q into a product of 
positive powers of distinct prime ideals be 


pOq = Pi + Pe. 
If we put O; = Pl < j <s), then also 

pO = Of + OF. 
But P; QO; = nj @aq for some integer n; > 1 and hence 


es 


2. fl 
Dp =n, rons. 


Evidently the only possibilities are 


(GY s=2,n,=m=p,ep=a=l; 
(ii! s =1,n, = p*,e, =1; 
(iii) s = 1,n, = p,ey =2. 


Since the factorisation is unique apart from order, this yields the three possibilities in 
the statement of the proposition. 


Proposition 21 does not tell us which of the three possibilities holds for a given 
prime p. For p # 2, the next result gives an answer in terms of the Legendre 
symbol. 


Proposition 22 Let p be an odd prime. Then, in the statement of Proposition 21, (i), 
(11), or (iii) holds according as 


p{dand(d/p)=1, ptdand(d/p)=-1, or pld. 


Proof Suppose first that p { d and that there exists a € Z such that a* = dmod p. 
Then p { a and a*—d = pb for some b € Z. If P = (p,a + Va), then 
P’ = (p,a— Jd) and 


PP'= p(p,a+vVd,a afd Bs 


Since (p,a + /d,a — Vd, b) contains 2a, which is relatively prime to p, it also 
contains 1. Hence PP’ = p@q. Furthermore P # P’, since P = P’ would 
imply 2a € P and hence | € P. We do not need to prove that P is a prime ideal, 
since what we have already established is incompatible with cases (ii) and (iii) of 
Proposition 21. 
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Suppose next that p|d. Then d = pe for some e € Z and ple, since d is square- 
free. If P = (p, Vd), then 


P? = p(p, Vd, e) = pG6a, 


since (p, e) = 1. Since we cannot be in cases (i) or (11) of Proposition 21, we must be 
in case (iil). 

Suppose conversely that either (i) or (iii) of Proposition 21 holds. Then the corre- 
sponding prime ideal P contains p. Choose f = a and y = b+cwas in Proposition 14, 
so that 


P={mfh +ny: m,n eZ}. 


In the present case we must have a = p, since p € P and 1 ¢ P. We must also 
have c = 1, since PP’ = p@ 4 implies ac = p. With these values of a and c the final 
condition of Proposition 14 takes the form 


b* =dmodp if d = 2 or 3mod4, 
b(b— 1) =(d—1)/4modp_ ifd = 1 mod4. 


Thus in the latter case (2b — 1)* = d mod p. In either case if p{d, then (d/p) = 1. 
This proves that if pd and (d/p) = —1, then we must be in case (ii) of Proposi- 
tion 21. 


Proposition 23 Let p = 2. Then, in the statement of Proposition 21, (i),(ii), or (iii) 
holds according as 


d= 1mod8,d =5mod8, ord = 2,3 mod4. 


Proof Since the proof is similar to that of the previous proposition, we will omit 
some of the detail. Suppose first that d = 1 mod 8. If P = (2, (1 — V/d)/2), then P’ = 
(2, 1 + Vd)/2) and 


PP! =2(2,(1— Vd)/2, (1+ Vd)/2, (1 — d)/8). 


It follows that PP’ = 264 and P F# P’. 
Suppose next that d = 2 mod 4. Then d = 2e, where e is odd. If P = (2, Vd), then 


P* = 2(2, Vd, e) = 264. 
Similarly, if d = 3 mod4 and P = (2,1 + Jd), then 
P? = 2(2,14 Vd, (1+ d)/2+ Vd) = 264. 


Suppose conversely that either (i) or (iii) of Proposition 21 holds. Then the corre- 
sponding prime ideal P contains 2. Choose £ = a and y = b+cwas in Proposition 14, 
so that 


P={mB+ny:m,ne€Z}. 
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In the present case we must have a = 2, c = | and 
b(b — 1) = (d-1)/4mod2_ ifd = 1mod4. 


Since b(b — 1) is even, it follows that d € 5 mod 8. 
This proves that if d = 5 mod 8, then we must be in case (ii) of Proposition 21. 


Proposition 22 uses only Legendre’s definition of the Legendre symbol. What 
does the law of quadratic reciprocity tell us? By Proposition 4, if p and gq are dis- 
tinct odd primes and d an integer not divisible by p such that g = pmod4d, then 
(d/p) = (d/q). Consequently, by Proposition 22, whether (i) or (ii) holds in Propo- 
sition 21 depends only on the residue class of p mod 4d. Thus, for given d, we need 
determine the behaviour of only finitely many primes p. 

We mention without proof some further properties of the ring Gy. We say that 
two nonzero ideals A, A in Og are equivalent, and we write A ~ A, if there exist 
nonzero principal ideals (a), (&) such that (a)A = (@)A. It is easily verified that this 
is indeed an equivalence relation. Moreover, if A ~ A and B ~ B, then AB ~ AB. 
Consequently, if we call an equivalence class of ideals an ideal class, we can without 
ambiguity define the product of two ideal classes. The set of ideal classes acquires in 
this way the structure of a commutative group, the ideal class containing the conjugate 
A’ of A being the inverse of the ideal class containing A. It may be shown that this 
ideal class group is finite. The order of the group, i.e. the number of different ideal 
classes, is called the class number of the quadratic field Q(/d) and is traditionally 
denoted by h(d). The ring Gg is a principal ideal domain if and only if h(d) = 1. (It 
may be shown that @y is a factorial domain only if it is a principal ideal domain.) 

The theory of quadratic fields has been extensively generalized. An algebraic num- 
ber field K is a field containing the field Q of rational numbers and of finite dimen- 
sion as a vector space over Q. An algebraic integer is a root of a monic polynomial 
x" tayx""!+4..-+ a, with coefficients a1,...,d, € Z. The set of all algebraic inte- 
gers in a given algebraic number field K is aring @(K). It may be shown that, also in 
O(K), any nonzero proper ideal can be represented as a product of prime ideals and the 
representation is unique except for the order of the factors. One may also construct the 
ideal class group of K and show that it is finite, its order being the class number of K. 

Some of the motivation for these generalizations came from ‘Fermat’s last theo- 
rem’. Fermat (c. 1640) asserted that the equation x” + y” = z” has no solutions in posi- 
tive integers x, y, zifm > 2.In Proposition 12 we proved Fermat’s assertion for n = 3. 
To prove the assertion in general it is sufficient to prove it when n = 4 and whenn = p 
is an odd prime, since if x*”"+ yk" = zk™, then (x*)"+(y*')” = (z*)”. Fermat himself 
gave a proof for n = 4, which is reproduced in Chapter XIII. Proofs for n = 3, 5 and 7 
were given by Euler (1760-1770), Legendre (1825) and Lamé (1839) respectively. 

Kummer (1850) made a remarkable advance beyond this by proving that the asser- 
tion holds whenever n = p is a ‘regular’ prime. Here a prime p is said to be regular 
if it does not divide the class number of the cyclotomic field Q(¢p), obtained by 
adjoining to Q a p-th root of unity ¢,. Kummer converted his result into a practical 
test by further proving that a prime p > 3 is regular if and only if it does not divide 
the numerator of any of the Bernoulli numbers Bz, B4,..., Bp-3. 

The only irregular primes less than 100 are 37, 59 and 67. Other methods for deal- 
ing with irregular primes were devised by Kummer (1857) and Vandiver (1929). By 


152 Ill More on Divisibility 


1993 Fermat’s assertion had been established in this way for all n less than four million. 
However, these methods did not lead to a complete proof of “Fermat’s last theorem’. 
As will be seen in Chapter XIII, a complete solution was first found by Wiles (1995), 
using quite different methods. 


3 Multiplicative Functions 


We define a function f : N > C to be an arithmetical function. The set of all arith- 
metical functions can be given the structure of a commutative ring in the following 
way. 

For any two functions f, g : N > C, we define their convolution or Dirichlet 
product f *g:N— Cby 


f*gn) = >) f@s(n/d). 


d|n 


Dirichlet multiplication satisfies the usual commutative and associative laws: 
Lemma 24 For any three functions f, g,h:N— C, 
feg=grf, fx(gehh=(fxg)eh. 


Proof Since n/d runs through the positive divisors of n at the same time as d, 


f *g(n) = >. f@gn/d) 


d|n 


=>) f(n/d)g(d) = 8 * fo). 


d|n 


To prove the associative law, put G = g « h. Then 


f*Gn)= >) [OME = >) f@ DY) s@ra’) 


de=n de=n d'd"=e 
d F@sa'ynca’). 
dd'd"=n 


Similarly, if we put F = f * g, we obtain 


Fehn)= >) FEOh@= >> Dd FA@)g@"\h@) 


de=n de=n d'd"=e 
= > f@)g@h@d). 
dd'd"=n 


Hence F * h(n) = f * G(n). 
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For any two functions f, g : N > C, we define their sumf + g : N > C in the 
natural way: 


(f + g)(n) = f(n) + g(). 


It is obvious that addition is commutative and associative, and that the distributive law 
holds: 


fe(gth=frgtfrh. 
The function 6 : N > C, defined by 
d(n) =1or0O  accordingasn = lorn > 1, 
acts as an identity element for Dirichlet multiplication: 
ox f=f forevery f:N>5C, 
since 
6% f(n) = > dd) f (n/d) = f(r). 
d|n 


Thus the set . of all arithmetical functions is indeed a commutative ring. 
For any function f : N > C which is not identically zero, put | f| = v(f)7!, 
where v(f) is the least positive integer n such that f(n) 4 0, and put |O| = 0. Then 


If * gl=Ifllgl. If +l < max(|fl,lgl) forall f,g e &. 


Hence the ring & of all arithmetical functions is actually an integral domain. It is 
readily shown that the set of all f € © such that | f| < 1 is an ideal, but not a prin- 
cipal ideal. (Although . is not a principal ideal domain, it may be shown that it is a 
factorial domain.) 

The next result shows that the functions f € & such that | f| = 1 are the units in 
the ring ./: 


Lemma 25 For any function f :N > C, there is a function f~'! : N > C such that 
f-!« f = 6 if and only if f (1) ¥ 0. The inverse f—' is uniquely determined and 
f"@MFf@) =1. 
Proof Suppose g : N > C has the property that g*« f = 6. Then g(1) f(1) = 1. Thus 
g(1) is non-zero and uniquely determined. If n > 1, then 

> g(a) f(n/d) = 0. 

d|n 


Hence 


sia f)=- >) gd)f(n/d). 


d|n,d<n 


It follows by induction that g(7) is uniquely determined for every n € N. Conversely, 
if g is defined inductively in this way, then g * f = 0. 
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It follows from Lemma 25 that the set of all arithmetical functions f : N > C 
such that f(1) 4 0 is an abelian group under Dirichlet multiplication. 
A function f : N > Cis said to be multiplicative if it is not identically zero and if 


f (mn) = f(m) f(a) forall m,n with (m,n) = 1. 


It follows that f(1) = 1, since f(n) 4 0 for some n and f(n) = f(n)f(1). Any 
multiplicative function f : N — C is uniquely determined by its values at the prime 
powers, since if 


— »A1 a 
m= Pp, Ds*> 
where pj,..., Ps are distinct primes and a|,..., as € N, then 


fm) = f (pt) FS). 
If 


m=|[p”, n=] p*, 
P P 
where &», Bp = O, then 


(m,n)=[]p”, (m,n) =] |v”, 
Pp Pp 


where yp = min{ap, Bp} and dp = max{ap, Bp}. Since either yp = ap and dp = Bp, 
or yp = Bp and dp = Gp, it follows that, for any multiplicative function f and all 
m,neN, 


f(@m,n)) f (im, nl) = [] ££”) =] FF 0”) = fm) FO). 
Pp 


Pp 


As we saw in §5 of Chapter II, it follows from Proposition II.4 that Euler’s 
g-function is multiplicative. Also, the functions i : N-— Cand j : N - C, defined by 


i(n) =1, j(n) =n _ foreveryn EN, 


are obviously multiplicative. Further examples of multiplicative functions can be con- 
structed with the aid of the next two propositions. 


Proposition 26 If f,g : N — C are multiplicative functions, then their Dirichlet 
producth = f * g is also multiplicative. 


Proof We have 


h(n) = >° f@gn/d). 


d|n 


Suppose n = n/n", where n’ and n” are relatively prime. Then, by Proposition II.4, 
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h(n) = > f(d'd")g(n'n" /d'a") 


d'\n',d"\n" 
= Dd) £@)F@)g('/d)gn"/d") 


d'|n',d"\n" 


= ye f(d’)g(v'/d') > f(d’)g(n"/d") = h(n’)h(n"). 


d'|n! d!\n" 


Proposition 27 If f : N > C is a multiplicative function, then its Dirichlet inverse 
f7!:N— Cis also multiplicative. 


Proof Assume on the contrary that g := f~! is not multiplicative and let n’,n” be 
relatively prime positive integers such that g(n'n”) 4 g(n’)g(n”). We suppose n’, n” 
chosen so that the product n = n/n” is minimal. Since f is multiplicative, f(1) = 1 


and hence g(1) = 1. Consequently n’ > 1,n” > 1 and 


0= >i ed’) fn'/d) = >> ga") fa"/d") = > s@fn/d). 


d'\n' d"\n" d|n 
But 
Si s@Mfa/d=safM+ >> gd'd"\f'n"/d'd") 
d|n d'\n',d"\n",d’d" <n 


=sm+ >) g@)ga")f(n'/d)fn"/d") 


d'\n',d"\n" ,d’d" <n 
= g(n) — g(n')g(n") + >) gd) fa'/d')- >) ga") fn"/d") 
d'|n! d"\n! 


= g(n) — g(n’)g(n"). 


Thus we have a contradiction. 


It follows from Propositions 26 and 27 that under Dirichlet multiplication the mul- 
tiplicative functions form a subgroup of the group of all functions f : N — C with 
fC) # 0. The further subgroup generated by i and j contains some interesting func- 
tions. Let t(m) denote the number of positive divisors of n, and let a(n) denote the 
sum of the positive divisors of n: 


t(n) = > a(n) = did. 


d\n d|n 
In other words, 
T=I1*l, CO =i1* J, 


and hence, by Proposition 26, t and o are multiplicative functions. Thus they are 
uniquely determined by their values at the prime powers. But if p is prime anda e€ N, 
the divisors of p® are 1, p,..., p® and hence 


t(p*)=atl, o(p*) = (p*t! -1)/(p— 1). 
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By Proposition II.24, Euler’s g-function satisfies i * @ = j. Thus 9 =i7! * j, and 
Propositions 26 and 27 provide a new proof that Euler’s g-function is multiplicative. 
Since 


TX¥O=1¥*I*Q=I*X J =O, 
we also obtain the new relation 


a(n) = >) t(n/d)g(d). 


d|n 


The Mobius function 4 : N — C is defined to be the Dirichlet inverse i~!. Thus 
u*i = or, in other words, 


du) =lor0 accordingasn = lorn > 1. 
dln 


Instead of this inductive definition, we may explicitly characterize the Mobius 
function in the following way: 


Proposition 28 For anyn € N, 
1 ifn = 1, 
H(n) = 4 (-—1)° ifn is a product of s distinct primes, 
0 ifn is divisible by the square of a prime. 


Proof Itis trivial that «(1) = 1. Suppose p is prime and a € N. Since the divisors of 
p® are 1, p,..., p%, we have 


(1) + u(p) +--+ + u(p*) = 0. 


Since this holds for all a e€ N, it follows that u(p) = —u(1) = —1, whereas 
U(p*) = 0if a > 1. Since the Mobius function is multiplicative, by Proposition 27, 
the general formula follows. 


The function defined by the statement of Proposition 28 had already appeared in 
work of Euler (1748), but Mobius (1832) discovered the basic property which we have 
adopted as a definition. From this property we can easily derive the Mdébius inversion 
formula: 


Proposition 29 For any function f :N > C, if f : N > C is defined by 
fm=>°f@, 
d|n 


then 


f@ = >> f@ua/d) = >) fa/dud). 


d|n d|n 


Furthermore, for any function 7 :N-7C there is a unique function f :N> C 
such that f(n) = diay f (d) for everyn € N. 
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Proof Let f :N > C be given and put 7 = f «i. Then 


fxeu=Hfruixtu=frebd=f. 


Conversely, let f : N—- C be given and put f = fu. Then f *i = fxo= rz 
Moreover, by the first part of the proof, this is the only possible choice for f. 


If we apply Proposition 29 to Euler’s g-function then, by Proposition II.24, we 
obtain the formula 


o(n)=n >) uld)/d. 


d|n 
In particular, if n = p*, where p is prime anda € N, then 
o(p*) = u)p* + n(p)p*' = p*( — 1/p). 


Since g is multiplicative, we recover in this way the formula 


p(n) = n[ Ja —1/p) foreveryneN. 
pin 


The o-function arises in the study of perfect numbers, to which the Pythagoreans 
attached much significance. A positive integer n is said to be perfect if it is the sum of 
its (positive) divisors other than itself, i.e. if o(m) = 2n. 

For example, 6 and 28 are perfect, since 


6=1424+3, 28=14+24447+4+14. 


It is an age-old conjecture that there are no odd perfect numbers. However, the even 
perfect numbers are characterized by the following result: 


Proposition 30 An even positive integer is perfect if and only if it has the form 
2'(2'+! — 1), where t € N and 2'+! — 1 is prime. 


Proof Let n be any even positive integer and write n = 2'm, where t > 1 and m is 
odd. Then, since o is multiplicative, 0 (n) = da (m), where 


d:=o(2)=2't! 1, 
If m = d and d is prime, then o(m) = 1 +d = 2'+! and consequently 
a(n) = 2'+!m = 2n. 


On the other hand, if o(n) = 2n, then do(m) = 2't!m. Since d is odd, it follows 
that m = dq for some q € N. Hence 


o(m) =2't!¢=(1t+-d)q=q+m. 


Thus q is the only proper divisor of m. Hence gq = 1 and m = d is prime. 
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The sufficiency of the condition in Proposition 30 was proved in Euclid’s Elements 
(Book IX, Proposition 36). The necessity of the condition was proved over two thou- 
sand years later by Euler. The condition is quite restrictive. In the first place, if 2”” — 1 is 
prime for some m € N, then m must itself be prime. For, ifm = rs, where 1 <r <m, 
then with a = 2° we have 


oad —1le@=1@ Saat ee 4 


A prime of the form M, := 2? — 1 is said to be a Mersenne prime in honour 
of Mersenne (1644), who gave a list of all primes p < 257 for which, he claimed, 
My, was prime. However, he included two values of p for which M, is composite and 
omitted three values of p for which M, is prime. The correct list is now known to be 


p=2,3,5, 7, 13, 17, 19, 31, 61, 89, 107, 127. 


The first four even perfect numbers, namely 6, 28, 496 and 8128, which correspond to 
the values p = 2, 3,5 and 7, were known to the ancient Greeks. 

That Mj, is not prime follows from 2'! — 1 = 2047 = 23 x 89. The factor 23 
is not found simply by guesswork. It was already known to Fermat (1640) that if p 
is an odd prime, then any divisor of M, is congruent to | mod 2p. It is sufficient to 
establish this for prime divisors. But if q is a prime divisor of M,, then 2? = 1 modq. 
Hence the order of 2 in FF divides p and, since it is not 1, it must be exactly p. Hence, 
by Lemma II.31 with G = Kes p divides q — 1. Thus g = | mod p and actually 
q = 1mod 2p, since q is necessarily odd. 

The least 39 Mersenne primes are now known. The hunt for more uses thousands 
of linked personal computers and the following test, which was stated by Lucas (1878), 
but first completely proved by D.H. Lehmer (1930): 


Proposition 31 Define the sequence (S,) recurrently by 
Si=4, Sry =S?-2 2 > 1). 
Then, for any odd prime p, My := 2? — 1 is prime if and only if it divides Sp_1. 
Proof Put 
a=24+V3, of =2-V3. 
Since wa’ = 1, it is easily shown by induction that 


n-1 n—1 
S, =o +07 (n > 1). 


Let q be a prime and let J denote the set of all real numbers of the form a + bV3, 
where a,b € Z. Evidently J is a commutative ring. By identifying two elements 
a+ b,/3 and a + bv of J when a = G and b = b moda, we obtain a finite commu- 
tative ring J, containing q° elements. The set In of all invertible elements of Jg is a 


commutative group containing at most q* — 1 elements, since 0 ¢ J, ee 
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Suppose first that M, divides S,—; and assume that M, is composite. If g is the 
least prime divisor of M,, then g<M p and q # 2. By hypothesis, 


p-2 p-2 
a” +a" =Omodgq. 


Now consider w and w’ as elements of J,. By multiplying by wo”, we obtain 


wo” =—1andhence w” = 1. Thus @ € Jj and the order of in J;* is exactly 2”. 


Hence 
2? <q?-1<M,-1=2? -2, 


which is a contradiction. 
Suppose next that M, = g is prime. Then g = —1mod8, since p > 3. Since 


(2/q) =(- 1)@-Y/ 8 it follows that 2 is a quadratic residue of q. Thus there exists an 
integer a such that 
a” =2mod q. 


Furthermore g = 1 mod3, since 27 = 1 and hence 2?-! = 1 mod3. Thus q is a 
quadratic residue of 3. Since g = —1mod4, it follows from the law of quadratic 
reciprocity that 3 is a quadratic nonresidue of g. Hence, by Euler’s criterion (Proposi- 
tion II.28), 


30-D/2 = —1 modq. 
Consider the element t = a?~?(1 + V3) of Jq. We have 
1 = 21 .%Wwe=a, 
since 27—! = 1 modq. On the other hand, 
(1+ 73)? =1439-9743 =1- V3 


and hence 


14 =al(1 — V3). 
Consequently, 
oft DP — 79+! — gt 2] — V3) -a? (1 + V3) = 27 *(-2) = - 1. 


Multiplying by w@+/4, we obtain o9+)/4 = —'4+/4, In other words, since 
q+1)/4=2?, 


gp-2 y2P- 


Sp-1 =@ +@ : =Omodq. 


It is conjectured that there are infinitely many Mersenne primes, and hence infi- 
nitely many even perfect numbers. A heuristic argument of Gillies (1964), as modified 
by Wagstaff (1983), suggests that the number of primes p < x for which M, is prime 
is asymptotic to (e” /log2)logx, where y is Euler’s constant (Chapter IX, §4) and 
thus e” /log2 = 2.570... 
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We turn now from the primality of 2” — | to the primality of 2” + 1. It is easily 
seen that if 2” + 1 is prime for some m é€ N, then m must be a power of 2. For, if 
m =rs,wherer > | is odd, then with a = 2° we have 


og El Sas Ie ta Hee 4 0), 
Put Fy, := QP a1, Thus, in particular, 
Po =3, FY=5, PFho=17, F3=257, Fy = 65537. 
Evidently Fy4) — 2 = (Fy — 2) Fy, from which it follows by induction that 
F, —2= FoFi---Fn-1 (n> 1). 


Since F,, is odd, this implies that (Fin, Fn) = 1 if m #4 n. As a byproduct, we have a 
proof that there are infinitely many primes. 

It is easily verified that F,, itself is prime for < 4. It was conjectured by Fermat 
that the “Fermat numbers’ F,, are all prime. However, this was disproved by Euler, who 
showed that 641 divides Fs. In fact 


641 =5.2'4+1=57 +21. 


Thus 5 - 2’ = —1 mod 641 and hence 277 = —5*. 278 = —(-1)* = —1 mod 641. 
Fermat may have been as wrong as possible, since no F, with n > 4 is known 
to be prime, although many have been proved to be composite. The Fermat numbers 
which are prime found an unexpected application to the construction of regular poly- 
gons by ruler and compass, the only instruments which Euclid allowed himself. It was 
shown by Gauss, at the age of 19, that a regular polygon of m sides can be constructed 
by ruler and compass if the order g(m) of Lin) is a power of 2. It follows from the 


formula g(p%) = p*—!(p — 1), and the multiplicative nature of Euler’s function, that 
v(m) is a power of 2 if and only if m has the form 2* . p, --- ps, where k > 0 and 
P1,---,» Ps are distinct Fermat primes. (Wantzel (1837) showed that a regular polygon 
of m sides cannot be constructed by ruler and compass unless m has this form.) Gauss’s 
result, in which he took particular pride, was a forerunner of Galois theory and is today 
usually established as an application of that theory. 

The factor 641 of Fs is not found simply by guesswork. Indeed, we now show that 
any divisor of F;, must be congruent to 1 mod2”t!, It is sufficient to establish this for 
prime divisors. But if p is a prime divisor of F,,, then 22" = —Imod p and hence 
22""" = | mod p. Thus the order of 2 in F* is exactly 2"+! Hence 2+! divides p — 1 
and p = 1 mod2"t!. 

With a little more effort we can show that any divisor of F;, must be congruent to 
1 mod2"*? ifn > 1. For, if p is a prime divisor of F, andn > 1, then p = 1 mod8 
by what we have already proved. Hence, by Proposition II.30, 2 is a quadratic residue 


: : : n+1 
of p. Thus there exists an integer a such that a* = 2 mod p. Since a? ‘ 


= —Imodp 
anda2""’ = 1 mod p, the order of a in Be is exactly 2”*? and hence 2”*? divides p—1. 
It follows from the preceding result that 641 is the first possible candidate for a 
prime divisor of F5, since 128k + 1 is not prime for k = 1,3,4 and 257 = F; is 
relatively prime to Fs. 
The hunt for Fermat primes today uses supercomputers and the following test due 
to Pépin (1877): 


4 Linear Diophantine Equations 161 


Proposition 32 [fm > 1, then N :=2"-+1 is prime if and only if 3.N~)/? +1 is 
divisible by N. 

Proof Suppose first that N divides aV—!)/? + 1 for some integer a. If p is any prime 
divisor of N, then a‘V~!)/? = —1 mod p and hence aN~! = 1 mod p. Thus, since p 
is necessarily odd, the order of a in es divides N — 1 = 2’, but does not divide 
(N — 1)/2 = 2”7!. Hence the order of a must be exactly 2””. Consequently, by 
Lemma II.31 with G = Lee 2” divides p — 1. Thus 


2" <p-1<N-1=2", 


which implies that NV = p is prime. 

To prove the converse we use the law of quadratic reciprocity. Suppose N = p 
is prime. Then p > 3, since m > 1. From 2 = —1mod3 we obtain p = 
(—1)" + 1 mod3. Since 3 { p, it follows that p = —1 mod3. Thus p is a quadratic 
non-residue of 3. But p = | mod4, sincem > 1. Consequently, by the law of quadratic 
reciprocity, 3 is a quadratic non-residue of p. Hence, by Euler’s criterion, 3°? ))/* = 
—I1mod p. 


By means of Proposition 32 it has been shown that F)4 is composite, even though 
no nontrivial factors are known! 


4 Linear Diophantine Equations 


A Diophantine equation is an algebraic equation with integer coefficients of which the 
integer solutions are required. The name honours Diophantus of Alexandria (3rd cen- 
tury A.D.), who solved many problems of this type, although the surviving books of 
his Arithmetica do not treat the linear problems with which we will be concerned. 

We wish to determine integers x1, ..., X, such that 


Q\y1X1 +++ + ainXp = C1 
dX] +++ + A2nXn = C2 
Am1X1 + +++ + AmnXn = Cm, 


where the coefficients a;, and the right sides c; are given integers (1 < j < m, 
1 < k <n). We may write the system, in matrix notation, as 


Ax =C. 


The problem may also be put geometrically. A nonempty set M C Z” is said to be 
a Z—module, or simply a module, ifa,b € M andx, y € Zimply xa+ yh EM. 


For example, if a,,..., a, is a finite subset of Z’”, then the set M of all linear com- 
binations xja, +---+Xx,ay, with x1,...,x, € Zis a module, the module generated by 
a,...,@y. If we take aj,..., a, to be the columns of the matrix A, then M is the set 


of all vectors Ax with x € Z” and the system Ax = ¢ is soluble if and only if ¢ € M. 


162 Ill More on Divisibility 


Ifa module M is generated by the elements a1, ..., a, then it is also generated by 
the elements Db), ...,b,, where 


be = uygay +e +Unkan (UjrE Zi1l< j,k <n), 


if the matrix U = (u jx) is invertible. Here an n x n matrix U of integers is said to 
be invertible if there exists ann x n matrix U~! of integers such that U —lU =I, or, 
equivalently, UU~! = Ih. 

For example, if ax + by = 1, then the matrix 


is invertible, with inverse 


oa. oe) 
y a 

It may be shown, although we will not use it, that an n x n matrix U is invertible if 
and only if its determinant det U is a unit, i.e. det U = +1. Under matrix multiplica- 
tion, the set of all invertible n x n matrices of integers is a group, usually denoted by 
GL) (Z). 

To solve the linear Diophantine system Ax = c we replace it by a system By = c, 
where B = AU for some invertible matrix U. The idea is to choose U so that B has 
such a simple form that y can be determined immediately, and then x = Uy. 

We will use the elementary fact that interchanging two columns of a matrix A, or 
adding an integral multiple of one column to another column, is equivalent to postmul- 
tiplying A by a suitable invertible matrix U. In fact U is obtained by performing the 
same column operation on the identity matrix J,,. In the following discussion ‘matrix’ 
will mean ‘matrix with entries from Z’. 


Proposition 33 [f A = (a) ---dy) isa 1 x n matrix, then there exists an invertible 
n x n matrix U such that 


AU = (d0---0) 
if and only if d is a greatest common divisor of a1, ..., Gn. 
Proof Suppose first that there exists such a matrix U. Since 
A=(d0---0)U~!, 
d is acommon divisor of aj, ..., G,. On the other hand, 


d=ajbi +---+aybhn, 


where bj,..., b, is the first column of U. Hence any common divisor of aj, ..., an 
divides d. Thus d is a greatest common divisor of a1, ..., Gn. 
Suppose next that aj, ..., d, have greatest common divisor d. Since there is noth- 


ing to do ifn = 1, we assume n > | and use induction on n. Then if d’ is the greatest 
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common divisor of a2, ..., dy, there exists an invertible (n — 1) x (n — 1) matrix V’ 
such that 


(a2---dn)V' = (d’ 0---0). 
Since d is the greatest common divisor of a; and d’, there exist integers u, v such that 
qu+d'v =d. 
Put V = 1, @ V’ and W = W’ @ I,-2, where 
_yl 
w=() na) 
Then V and W are invertible, and 


(a, a2-+--dn)VW = (a, d' 0---0)W = (d0---0). 


Thus we can take U = VW. 


Corollary 34 For any given integers a,,...,@n, there exists an invertible n x n 
matrix U with a1,..., an as its first row if and only if the greatest common divisor 
of aj,...,4y is 1. 


Proof An invertible matrix U has aj, ..., dy as its first row if and only if 


(aj a2-+-dy) = (10---O)U. 


If U is invertible, then its transpose is also invertible. It follows that there exists 
an invertible n x n matrix with a),...,d@, as its first column also if and only if the 
greatest common divisor of a, ..., dy is 1. 


Proposition 35 For any m x n matrix A, there exists an invertible n x n matrix U 
such that B = AU has the form 


B= (BO), 
where B, isanm x r submatrix of rank r. 


Proof Let A have rank r. If r = n, there is nothing to do. Ifr < n and we denote the 
columns of A by a1,...,@n, then there exist x;,..., x, € Z, not all zero, such that 


XA, +++ + X7An = O27 


Moreover, we may assume that x;,...,X, have greatest common divisor 1. Then, 
by Corollary 34, there exists an invertible n x n matrix U’ with x1,..., Xn as its last 
column. Hence A’ := AU’ has its last column zero. If r < n — 1, we can apply the 
same argument to the submatrix formed by the first n — 1 columns of A’, and so on 
until we arrive at a matrix B of the required form. 
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The elements 5), ..., 5, of amodule M are said to be a basis for M if they generate 
M and are linearly independent, i.e. x;b) +---+-x,;b,; = O for some x1,...,x, € Z 
implies that x} = --- = x, = 0. If O is the only element of M, we say also that O is a 
basis for M. 


In geometric terms, Proposition 35 says that any finitely generated module M C 
Z'" has a finite basis, and that a finite set of generators is a basis if and only if its ele- 
ments are linearly independent over Q. Hence any two bases have the same cardinality. 


Proposition 36 For any m x n matrix A, the set N of allx € Z” such that Ax = O is 
a module with a finite basis. 


Proof It is evident that N is a module. By Proposition 35, there exists an invertible 
n X n matrix U such that AU = B = (B,O), where By is an m x r submatrix of 
rank r. Hence By = O if and only if the first r coordinates of y vanish. By taking y to 
be the vector with k-th coordinate | and all other coordinates 0, for each k such that 
r <k <n, wesee that the equation By = O has n — r linearly independent solutions 
y™),...,y"—" such that all solutions are given by 


y= zy) Bea a ae 


where z1,...,Zn—r are arbitrary integers. If we put xY) = Uy), it follows that 
x)... x") are a basis for the module N. 


Anim x n matrix B = (bjx) will be said to be in echelon form if the following two 
conditions are satisfied: 


Gi) bjx =O forall j ifk > r; 
(ii) bjx A 0 for some j if k < r and, if mx is the least such j, then 1 < m, < m2 < 
_< mM, <M. 


Evidently r = rankB. 


Proposition 37 For any m x n matrix A, there exists an invertible n x n matrix U 
such that B = AU is in echelon form. 


Proof By Proposition 35, we may suppose that A has the form (A; QO), where A, is 
an m x r submatrix of rank r, and by replacing A; by A we may suppose that A itself 
has rank n. We are going to show that there exists an invertible n x n matrix U such 
that, if AU = B = (bjx), then bj, = 0 forall j < k. 

If m = 1, this follows from Proposition 33. We assume m > I and use in- 
duction on m. Then the first m — 1 rows of A may be assumed to have already the 
required triangular form. If m < m, there is nothing more to do. If n > m, we can take 
U = In-1 8 U’, where U’ is an invertible (n —m +1) x (n —m + 1) matrix such that 


(4in,m Gm,m+1 °° -Am,n)U" = (a’ Q-- -0). 


Replacing B by A, we now suppose that for A itself we have aj, = 0 for all 
J <k. Since A still has rank n, each column of A contains a nonzero entry. If the first 
nonzero entry in the k-th column appears in the mx-th row, then my > k. By permuting 
the columns, if necessary, we may suppose in addition that m, < m2 <---< mp. 
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Suppose m, = mz. Let a and b be the entries in the m-th row of the first and 
second columns, and let d be their greatest common divisor. Then d ¥ 0 and there 
exist integers u,v such that au + bv = d. If we put U = V @ I,_2, where 


v= (: By 

vo a/d J’ 

then U is invertible. Moreover, the last n — 2 columns of B = AU are the same 

as in A and the first m; — 1 entries of the first two columns are still zero. However, 

bm, = d and by,,2 = 0. By permuting the last n — 1 columns, if necessary, we obtain a 

matrix A’, of the same form as A, with m, < m4 < --- < m',, where m', = m, and 
my +--+ mi, > mz +-+-+imn. 

By repeating this process finitely many times, we will obtain a matrix in echelon 

form. 


Corollary 38 /f A is anm x n matrix of rank m, then there exists an invertible n x n 
matrix U such that AU = B = (bx), where 


bjj #0, bj =Oifj<k U<j<m,l<k<n). 


Before proceeding further we consider the uniqueness of the echelon form. Let 
T = (tjx) be any r x r matrix which is lower triangular and invertible, i.e. tj, = 0 if 
j <k and the diagonal elements f;; are units. It is easily seen that if U = T @ In_,, 
and if B is an echelon form for a matrix A with rank r, then BU is also an echelon form 
for A. We will show that all possible echelon forms for A are obtained in this way. 

Suppose B’ = BU is in echelon form, for some invertible n x n matrix U, and write 


B=(B, O), 
where By, is anm x r submatrix. If 
{Uri UW 
v= (7) u.): 
then from (B, O)U = (Bi O) we obtain U2 = O, since Bj U2 = O and B, has rank r. 


Consequently U; is invertible and we can equally well take U3 = O, U4 = TI. Let 
b,...,b, be the columns of B; and b/,,...,/. the columns of B}. If Uy = (¢;x), then 


BD, = tebi +---+teb> (lk <r) 


Taking k = 1, we obtain m), > m, and so, by symmetry, m/, = mj. Since m, > m', 

for k > 1, it follows that t;, = 0 fork > 1. Taking k = 2, we now obtain in the same 

way m‘, = mp. Proceeding in this manner, we see that Uj is a lower triangular matrix. 
We return now to the linear Diophantine equation 


Ax =C. 


The set of alle € Z” for which there exists a solutionx € Z” is evidently a module L C 
Z" . lf U is an invertible matrix such that B = AU is in echelon form, then x is a solu- 
tion of the given system if and only ify = U~'x isa solution of the transformed system 
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But the latter system is soluble if and only if ¢ is an integral linear combination of the 
first r columns b;,..., 5, of B. Since bi, ...,b, are linearly independent, they form 
a basis for L. 

To determine if a given system Ax = c is soluble, we may use the customary 
methods of linear algebra over the field Q of rational numbers to test if ¢ is linearly 
dependent on b;,...,5,; then express it as a linear combination of b),...,b,-, and 
finally check that the coefficients y;,..., y, are all integers. The solutions of the orig- 
inal system are given by x = Uy, where y is any vector in Z” whose first r coordinates 
ALE V1 sdag VPs 

If M, and Mp) are modules in Z”, their intersection M, \ M? is again a module. 
The set of alla € Z” such that a = a, + a2 for some a; € M, and az € M2 is alsoa 
module, which will be denoted by M; + M2 and called the sum of M, and M2. If M, 
and Mp) are finitely generated, then M, + Mz is evidently finitely generated. We will 
show that M1 1 Mz? is also finitely generated. 


Since M,+M)? is a finitely generated module in Z”, it has a basis a1, ..., €y. Since 
M, and M) are contained in M, + My, their generators bj,...,b,) andcj,...,¢q have 
the form 


n 


cj = > Kj ak, 


k=1 


for some ux;, 04; € Z. Thena € M, 1 M2 if and only if 


P q 
a= > yjbj = > Zjej 
i=l j=l 
for some y;, zj € Z. Since a), ..., ay is a basis for M, + Mz, this is equivalent to 


p 


q 
Da meivi = Drei 
j=l 


i=1 


or, in matrix notation, By = Cz. But this is equivalent to the homogeneous system 
Ax = O, where 


z 


A=(B-C), (0), 


and by Proposition 36 the module of solutions of this system has a finite basis. 

Suppose the modules M,, M2 C Z” are generated by the columns of the m x nj, 
m X nz matrices A;, Az. Evidently M2 is a submodule of Mj if and only if each 
column of A» can be expressed as a linear combination of the columns of Aj, i.e. if 
and only if there exists an, x n2 matrix X such that 
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We say in this case that A, is a left divisor of Az, or that A» is a right multiple of Ay. 

We may also define greatest common divisors and least common multiples for 
matrices. An m x p matrix D is a greatest common left divisor of A, and A2 if it is a 
left divisor of both A; and Ao, and if every left divisor C of both A; and Ap? is also a 
left divisor of D. Anm x q matrix H is a least common right multiple of A, and A2 
if it is a right multiple of both A, and Ao, and if every right multiple G of both A; and 
Az is also a right multiple of H. It will now be shown that these objects exist and have 
simple geometrical interpretations. 

Let M,, M2 be the modules defined by the matrices A;, Az. We will show that if 
the sum M, + Mz? is defined by the matrix D, then D is a greatest common left divisor 
of A, and Ao. In fact D is a common left divisor of A; and Az, since M; and M2 are 
contained in M; + Mo. On the other hand, any common left divisor C of A, and A2 
defines a module which contains M,; + Mo, since it contains both M; and M2, and so 
C is a left divisor of D. 

A similar argument shows that if the intersection M, M Mz is defined by the matrix 
H, then H is a least common right multiple of A; and Ao. 

The sum M,; + Mz is defined, in particular, by the block matrix (A; A2). There 
exists an invertible (nj +2) x (nj +2) matrix U such that 


(Ai A2)U = (D' O), 
where D’ is anm x r submatrix of rank r. If 
v= (ou) 
is the corresponding partition of U, then 
A,U, + A2U3 = D’. 
On the other hand, 
(Ai A2) = (DONUT. 


-1_ (Mi Vv 
ut=(K yi) 


is the corresponding partition of U~!, then 


If 


Aj =D'V;, A2=D'V». 


Thus D’ is a common left divisor of A; and Ao, and the previous relation implies that 
it is a greatest common left divisor. It follows that any greatest common left divisor D 
of A; and Az has aright “Bézout’ representation D = A,X, + A2X2. 

We may also define coprimeness for matrices. Two matrices Aj, A2 of size 
m X n,m X nz are left coprime if Im is a greatest common left divisor. If M1, M2 
are the modules defined by Aj, Az, this means that My + M> = Z”. The definition 
may also be reformulated in several other ways: 
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Proposition 39 For any m x n matrix A, the following conditions are equivalent: 


(i) for some, and hence every, partition A = (A, Az), the submatrices A, and A2 
are left coprime; 
(ii) there exists ann xX m matrix At such that AAt = Tm 
(iti) there exists an (n — m) x n matrix A‘ such that 


A 
AS 
is invertible; 


(iv) there exists an invertible n x n matrix V such that AV = (Im O). 


Proof If A = (A, A2) for some left coprime matrices A;, Az, then there exist X1, 
X» such that Ay X; + A2X2 = J, and hence (ii) holds. On the other hand, if (11) holds 
then, for any partition A = (A, Az), there exist X;, X2 such that Ay X; + A2X2 = In 
and hence A}, A2 are left coprime. 

Thus (i) & (ii). Suppose now that (ii) holds. Then A has rank m and hence there 
exists an invertible n x n matrix U such that A = (D O)U, where the m x m matrix 
D is non-singular. In fact D is invertible, since AAt = ‘mm implies that D is a left 
divisor of J,,. Consequently, by changing U, we may assume D = I,,. If we now take 
AS = (O In_-m)U, we see that (11) = (iii). 

It is obvious that (111) => (iv) and that (iv) => (ii). 


We now consider the extension of these results to other rings besides Z. Let R be 
an arbitrary ring. A nonempty set M C R”™ is said to be an R-module if a,b ¢ M 
and x,y € R imply xa+ yb € M. The module M is finitely generated if it contains 
elements aj, ...,@, such that every element of M has the form xja, + +--+ Xnay for 
some X1,...,%, ER. 

It is easily seen that if R is a Bézout domain, then the whole of the preceding dis- 
cussion in this section remains valid if ‘module’ is interpreted to mean ‘R-module’ and 
‘matrix’ to mean “matrix with entries from R’. In particular, we may take R = K[f] to 
be the ring of all polynomials in one indeterminate with coefficients from an arbitrary 
field K. However, both Z and K[t] are principal ideal domains. In this case further 
results may be obtained. 


Proposition 40 Jf R is a principal ideal domain and M a finitely generated 
R-module, then any submodule L of M is also finitely generated. Moreover, if M is 
generated by n elements, so also is L. 


Proof Suppose M is generated by a), ...,ay. Ifn = 1, then any b € L has the form 
b = xa, for some x € R and the set of all x which appear in this way is an ideal of R. 
Since R is a principal ideal domain, it follows that L is generated by a single element 
b,, where b} = x’a, for some x’ € R. 

Suppose now that n > | and that, for each m < n, any submodule of a module 
generated by m elements is also generated by m elements. Any b € L has the form 


b= xa) +--+ + XnAn 


for some x1,...,Xn © R and the set of all x; which appear in this way is an ideal 
of R. Since R is a principal ideal domain, it follows that there is a fixed bj € L such 
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that b = yb; + b’ for some yj € R and some BD’ in the module M’ generated by 
a2, ...,4n. The set of all b’ which appear in this way is a submodule L’ of M’. By the 
induction hypothesis, L’ is generated by n — 1 elements and hence L is generated by n 
elements. 


Just as it is useful to define vector spaces abstractly over an arbitrary field K, so 
it is useful to define modules abstractly over an arbitrary ring R. An abelian group M, 
with the group operation denoted by +, is said to be an R-module if, with any a « M 
and any x € R, there is associated an element xa € M so that the following properties 
hold, for alla,b € M andall x,y eR: 


(i) x(a+b) = xa+ xb, 
(il) (x + y)a=xa+ ya, 
(iii) (xy)a = x(ya), 
(iv) la=a. 

The proof of Proposition 40 remains valid for modules in this abstract sense. How- 
ever, a finitely generated module need not now have a basis. For, even if it is generated 
by a single element a, we may have xa = O for some nonzero x € R. Neverthe- 
less, we are going to show that, if R is a principal ideal domain, all finitely generated 
R-modules can be completely characterized. 

Let R be a principal ideal domain and M a finitely generated R-module, with 
generators a1, ..., A, say. The set N of all x = (x1,...,Xn) € R” such that 


Xa, + +++ +XnAy, =O 


is evidently a module in R”. Hence N is finitely generated, by Proposition 40. The 
given module M is isomorphic to the quotient module R”/N. 

Let f;,...,f,, be a set of generators for N and let e1,...,e, be a basis for R”. 
Then 


fj = ajiei +--+ + ajnen (<j<m), 


for some aj, € R. The module M is completely determined by the matrix A = (ajx). 
However, we can change generators and change bases. 
If we put 


ff, =01f, +--+ +0imf mn (l<i<m), 


where V = (v;;) is an invertible m x m matrix, then’, ...,f7,, is also a set of gener- 
ators for N. If we put 


ee =ugie, t---tume, (<k <n), 


where U = (ux) is an invertible n x n matrix, then C1: ...,@,, is also a basis for R”. 
Moreover 


f, = bie +++ + bine, (1 <i <™m), 


where the m x n matrix B = (big) is given by B = VAU. 

The idea is to choose V and U so that B is as simple as possible. This is made 
precise in the next proposition, first proved by H.J.S. Smith (1861) for R = Z. The 
corresponding matrix S is known as the Smith normal form of A. 
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Proposition 41 Let R be a principal ideal domain and let A be anm x n matrix with 
entries from R. If A has rank r, then there exist invertible m x m,n x n matrices V, U 
with entries from R such that S = VAU has the form 


where D = diag{d, ..., d,] is a diagonal matrix with nonzero entries d; and d;\d; for 
lar <j =e. 


Proof We show first that it is enough to obtain a matrix which satisfies all the require- 
ments except the divisibility conditions for the d’s. 

If a, b are nonzero elements of R with greatest common divisor d, then there exist 
u,v € R such that au + bv = d. It is easily verified that 


1 1 \(a 0\(u -b/d\_ (ad 0 
—bv/d au/d)\0 b)\v a/d)~ \0 ab/d)’ 


and the outside matrices on the left-hand side are both invertible. By applying this 
process finitely many times, a non-singular diagonal matrix D’ = diag[d},...,d/.] 
may be transformed into a non-singular diagonal matrix D = diag[d1,..., d,] which 
satisfies dj|dj forl <i<j<r. 

Consider now an arbitrary matrix A. By applying Proposition 35 to the transpose 
of A, we may reduce the problem to the case where A has rank m and then, by Corol- 
lary 38, we may suppose further that aj; # 0, aj, =O forall j <k. 

It is now sufficient to show that, for any 2 x 2 matrix 


with nonzero entries a, b, c, there exist invertible 2 x 2 matrices U, V such that V AU is 
a diagonal matrix. Moreover, we need only prove this when the greatest common divi- 
sor (a, b, c) = 1. But then there exists g € R such that (a, b+qc) = 1. In fact, take g to 
be the product of the distinct primes which divide a but not b. For any prime divisor p 
of a, if p|b, then pt{c, p{q and hence pt} (b+qc); if p{b, then p|g and again p{(b+qc). 

If we put b’ = b+ qc, then there exist x, y € R such that ax +b’y = 1, and hence 
ax + by = 1 —qcy. It is easily verified that 


(GIG iZ)-6 «): 


and the outside matrices on the left-hand side are both invertible. 


In the important special case R = Z, there is a more constructive proof of Proposi- 
tion 41. Obviously we may suppose A ¥ O. By interchanges of rows and columns we 
can arrange that a, is the nonzero entry of A with minimum absolute value. If there 
is an entry a1z (k > 1) in the first row which is not divisible by a,1, then we can write 
aik = 2a11 +. a),, where z, a), € Zand |a‘,| < |ai1|. By subtracting z times the first 
column from the k-th column we replace aj, by a},. Thus we obtain a new matrix A 
in which the minimum absolute value of the nonzero entries has been reduced. 
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On the other hand, if aj; divides aj, for all k > 1 then, by subtracting multiples 
of the first column from the remaining columns, we can arrange that a,x = 0 for all 
k > 1. If there is now an entry aj\(j > 1) in the first column which is not divisible 
by a1, then, by subtracting a multiple of the first row from the j-th row, the minimum 
absolute value of the nonzero entries can again be reduced. Otherwise, by subtracting 
multiples of the first row from the remaining rows, we can bring A to the form 


aj; O 
O AS) 


Since A # O and the minimum absolute value of the nonzero entries cannot be 
reduced indefinitely, we must in any event arrive at a matrix of this form after a 
finite number of steps. The same procedure can now be applied to the submatrix A’, 
and so on until we obtain a matrix 
D’ O 
(0 6): 


where D’ is a diagonal matrix with the same rank as A. As in the first part of the proof 
of Proposition 41, we can now replace D’ by a diagonal matrix D which satisfies the 
divisibility conditions. 

Clearly this constructive proof is also valid for any Euclidean domain R and, in 
particular, for the polynomial ring R = K[t], where K is an arbitrary field. 

It will now be shown that the Smith normal form of a matrix A is uniquely deter- 
mined, apart from replacing each dx by an arbitrary unit multiple. For, if S’ is another 
Smith normal form, then S’ = V’SU’ for some invertible m x m, n xn matrices V’, U’. 
Since d, divides all entries of S, it also divides all entries of S’. In particular, d ild;. 
In the same way d{|d; and hence d} is a unit multiple of d;. To show that d; is a unit 
multiple of dx, also for k > 1, it is quickest to use determinants (Chapter V, §1). Since 
d,---d, divides all k x k subdeterminants or minors of S, it also divides all k x k 
minors of S’. In particular, d ---dg|d{--- dj. Similarly, d} ---dj|d1 --- dx and hence 
d; ---d;, is a unit multiple of d - - - dg. The conclusion now follows by induction on k. 

The products 4, := d,---dx (1 < k < r) are known as the invariant factors of 
the matrix A. A similar argument to that in the preceding paragraph shows that Ax is 
the greatest common divisor of all k x k minors of A. 

Two m x n matrices A, B are said to be equivalent if there exist invertible m x m, 
nxn matrices V, U such that B = V AU. Since equivalence is indeed an ‘equivalence 
relation’, the uniqueness of the Smith normal form implies that two m x n matrices 
A, B are equivalent if and only if they have the same rank and the same invariant 
factors. 

We return now from matrices to modules. Let R be a principal ideal domain and 
M a finitely generated R-module, with generators a1, ...,a@,. The Smith normal form 
tells us that M has generators a, ...,a/,, where 


ay = upd, +--+ + und, (lL<k <n) 
for some invertible matrix U = (uxe), such that da, =O (1 <k <r). Moreover, 


xa, +---4+x,a), =O 
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implies x, = ygdx for some yy € Rifl < k < randx, = Oifr<k <n.In 
particular, XK}, = O for 1 < k < n, and thus the module M is the direct sum of the 
submodules M‘,...,M’, generated by a‘, ..., a/, respectively. 

If Nx denotes the set of all x € R such that xa’, = O, then N, is the principal ideal 
of R generated by dy for 1 < k < r and N, = {0} forr < k < n. The divisibility 
conditions on the d’s imply that Nga, GC Me (1 < k <r). If Ny = R for some k, then 
a, contributes nothing as a generator and may be omitted. 

Evidently the submodule M’ generated by a,...,a/. consists of alla € M 
such that xa = O for some nonzero x € R, and the submodule M” generated by 
Ge qsrr sg Gy Has a, .., aj, as a basis. Thus we have proved the structure theorem 


rei 
for finitely generated modules over a principal ideal domain: 


Proposition 42 Let R be a principal ideal domain and M a finitely generated 

R-module. Then M is the direct sum of two submodules M’ and M", where M' consists 

of alla € M such that xa = O for some nonzero x € R and M" has a finite basis. 
Moreover, M' is the direct sum of s submodules Raj, ..., Ras, such that 


OCN, C-:--CM CR, 
where Ny, is the ideal consisting of all x € R such that xa, =O (1 <k <s). 


The uniquely determined submodule M’ is called the torsion submodule of M. The 
free submodule M" is not uniquely determined, although the number of elements in a 
basis is uniquely determined. Of course, for a particular M one may have M’ = {O} or 
M” = {0}. 

Any abelian group A, with the group operation denoted by +, may be regarded as a 
Z-module by defining na to be the sum a + --- + a with n summands if n € N, to 
be O if n = 0, and to be —(a + --- + a) with —n summands if —n € N. The struc- 
ture theorem in this case becomes the structure theorem for finitely generated abelian 
groups: any finitely generated abelian group A is the direct product of finitely many 
finite or infinite cyclic subgroups. The finite cyclic subgroups have orders d},..., ds, 
where d; > lifs > Oandd;j|d; ifi < j. In particular, A is the direct product of a 
finite subgroup A’ (of order d - -- d,-), its torsion subgroup, and a free subgroup A”. 

The fundamental structure theorem also has an important application to linear 
algebra. Let V be a vector space over a field K and T : V — V a linear transfor- 
mation. We can give V the structure of a K[t]-module by defining, for any v € V and 
any f=ajp+ajtt+---+a,t" € K{t], 


fo =agv +ajTv +++: +a_T"d. 


If V is finite-dimensional, then for any v € V there is a nonzero polynomial f such 
that fv = O. In this case the fundamental structure theorem says that V is the direct 
sum of finitely many subspaces Vj,..., V; which are invariant under T. If V; has 
dimension n; > 1, then there exists a vector w; € V; such that w;, Tw;,..., T”’~!w; 
are a vector space basis for V;(1 < i < s). There is a uniquely determined monic 
polynomial m; of degree n; such that m;(T)w; = O and, finally, m;|mj; if i < j. 

The Smith normal form can be used to solve systems of linear ordinary differential 
equations with constant coefficients. Such a system has the form 
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ayi(D)x1 + +++ + ajn(D)xn 
a2, (D)x1 + +++ + don(D)xn 


c1(t) 
c2(t) 


ami (D)x1 a an Amn (D)xn = Cm(t), 


where the coefficients a j,(D) are polynomials in D = d/dt with complex coefficients 
and the right sides c;(f) are, say, infinitely differentiable functions of the time f. Since 
C[s] is a Euclidean domain, we can bring the coefficient matrix A = (a;x(D)) to Smith 
normal form and thus replace the given system by an equivalent system in which the 
variables are ‘uncoupled’. 

For the polynomial ring R = K [tf] it is possible to say more about R-modules than 
for an arbitrary Euclidean domain, since the absolute value 


[fl =2 if f #0,|0| =0, 
has not only the Euclidean property, but also the properties 


Ift+gl<maxilfl igi}, Ifsl=lfllg| forany f.g € R. 


For any a € R”, where R = K[t], define |a| to be the maximum absolute value 
of any of its coordinates. Then a basis for a module M C R” can be obtained in the 
following way. Suppose M ¥ O and choose a nonzero element a; of M for which |a1| 
is a minimum. If there is an element of M which is not of the form pia; with p; € R, 
choose one, a2, for which |a2| is a minimum. If there is an element of M which is not 
of the form pia, + p2a2 with p;, p2 € R, choose one, a3, for which |a3| is a minimum. 
And so on. 

Evidently |a;| < |a2| < ---. We will show that a), a2, ... are linearly independent 
for as long as the the process can be continued, and thus ultimately a basis is obtained. 

If this is not the case, then there exists a positive integer k < m such thatay,..., ax 
are linearly independent, but a), ..., a, are not. Hence there exist 51,..., 541 € R 
with sx41 4 0 such that sya; +--+ + s¢41@x41 = O. For each j < k, there exist q;, 
rj € R such that 


Sj =QjSeti try, Irjl < bsesil. 
Put 
/ 
Ap, =Ak¢1 + qiai +--+ qKag, be = ray +++ + 1rKag. 


Since a+ is not of the form pia; + --- + pra, neither is ai and hence lay. 1| > 
|ax+1|. Furthermore, |by| < maxj<j<x |rjllaj| < |se+illax+il. Since by = Sk 1415 
by construction, this is a contradiction. 

A basis for M which is obtained in this way will be called a minimal basis. 
It is not difficult to show that a basis a),...,@, is a minimal basis if and only if 
ljaj| < +--+ < |a,| and the sum |a;| +---+ |a,| is minimal. Although a minimal basis 
is not uniquely determined, the values |aj|, ..., |a,| are uniquely determined. 
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5 Further Remarks 


For the history of the law of quadratic reciprocity, see Frei [16]. The first two proofs by 
Gauss of the law of quadratic reciprocity appeared in §8125—145 and §262 of [17]. A 
simplified account of Gauss’s inductive proof has been given by Brown [7]. The proofs 
most commonly given use “Gauss’s lemma’ and are variants of Gauss’s third proof. 
The first proof given here, due to Rousseau [46], is of this general type, but it does not 
use Gauss’s lemma and is based on a natural definition of the Jacobi symbol. For an 
extension of this definition of Zolotareff to algebraic number fields, see Cartier [9]. 

For Dirichlet’s evaluation of Gauss sums, see [33]. A survey of Gauss sums is given 
in Berndt and Evans [6]. 

The extension of the law of quadratic reciprocity to arbitrary algebraic number 
fields was the subject of Hilbert’s 9th Paris problem. Although such generalizations lie 
outside the scope of the present work, it may be worthwhile to give a brief glimpse. 
Let K = Q be the field of rational numbers and let L = Q(/d) be a quadratic exten- 
sion of K.If p is a prime in K, the law of quadratic reciprocity may be interpreted as 
describing how the ideal generated by p in L factors into prime ideals. Now let K be 
an arbitrary algebraic number field and let L be any finite extension of K. Quite gener- 
ally, we may ask how the arithmetic of the extension L is determined by the arithmetic 
of K. The general reciprocity law, conjectured by Artin in 1923 and proved by him 
in 1927, gives an answer in the form of an isomorphism between two groups, provided 
the Galois group of L over K is abelian. For an introduction, see Wyman [54] and, for 
more detail, Tate [51]. The outstanding problem is to find a meaningful extension to the 
case when the Galois group is non-abelian. Some intriguing conjectures are provided 
by the Langlands program, for which see also Gelbart [18]. 

The law of quadratic reciprocity has an analogue for polynomials with coefficients 
from a finite field. Let F, be a finite field containing q elements, where qg is a power 
of an odd prime. If g € F,[x] is a monic irreducible polynomial of positive degree, 
then for any f € F,[x] not divisible by g we define (f/g) to be 1 if f is congruent to 
a square mod g, and —1 otherwise. The law of quadratic reciprocity, which in the case 
of prime q was stated by Dedekind (1857) and proved by Artin (1924), says that 


GHG i= Epes 


for any distinct monic irreducible polynomials f, g € F,[x] of positive degrees m, n. 
Artin also developed a theory of ideals, analogous to that for quadratic number fields, 
for the field obtained by adjoining to F,[x] an element @ with ow? = D(x), where 
D(x) € F,[x] is square-free; see [3]. 

Quadratic fields are treated in the third volume of Landau [30]. There is also a 
useful resumé accompanying the tables in Ince [23]. 

A complex number is said to be algebraic if it is a root of a monic polynomial 
with rational coefficients and transcendental otherwise. Hence a complex number is 
algebraic if and only if it is an element of some algebraic number field. 

For an introduction to the theory of algebraic number fields, see Samuel [47]. This 
vast theory may be approached in a variety of ways. For a more detailed treatment 
the student may choose from Hecke [22], Hasse [20], Lang [32], Narkiewicz [38] and 
Neukirch [39]. There are useful articles in Cassels and Frohlich [10], and Artin [2] 
treats also algebraic functions. 
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For the early history of Fermat’s last theorem, see Vandiver [52], Ribenboim [41] 
and Kummer [28]. Further references will be given in Chapter XIII. 

Arithmetical functions are discussed in Apostol [1], McCarthy [35] and 
Sivaramakrishnan [48]. The term “Dirichlet product’ comes from the connection with 
Dirichlet series, which will be considered in Chapter IX, §6. The ring of all arithmeti- 
cal functions was shown to be a factorial domain by Cashwell and Everett (1959); the 
result is proved in [48]. 

In the form f(a A b) f(a Vv b) = f(a) f(b), the concept of multiplicative func- 
tion can be extended to any map f : L > C, where L is a lattice. Mobius inversion 
can be extended to any locally finite partially ordered set and plays a significant role in 
modern combinatorics; see Bender and Goldman [5], Rota [45] and Barnabei ef al. [4]. 

The early history of perfect numbers and Fermat numbers is described in 
Dickson [13]. It has been proved that any odd perfect number, if such a thing exists, 
must be greater than 10°° and have at least 8 distinct prime factors. On the other 
hand, if an odd perfect number N has at most k distinct prime factors, then N < 4s 
and thus all such N can be found by a finite amount of computation. See te Riele [42] 
and Heath-Brown [21]. 

The proof of the Lucas—Lehmer test for Mersenne primes follows Rosen [43] and 
Bruce [8]. For the conjectured distribution of Mersenne primes, see Wagstaff [53]. 
The construction of regular polygons by ruler and compass is discussed in 
Hadlock [19], Jacobson [24] and Morandi [36]. 

Much of the material in §4 is also discussed in Macduffee [34] and Newman [40]. 
Corollary 34 was proved by Hermite (1849), who later (1851) also proved 
Corollary 38. Indeed the latter result is the essential content of Hermite’s normal form, 
which will be encountered in Chapter VIII, §2. 

It is clear that Corollary 34 remains valid if the underlying ring Z is replaced by 
any principal ideal domain. There have recently been some noteworthy extensions to 
more general rings. It may be asked, for an arbitrary commutative ring R and any 


a\,...,@n € R, does there exist an invertible n xn matrix U with entries from R which 
has aj, ..., Gy as its first row? It is obviously necessary that there exist x1,...,%, € R 
such that 


AX, +--+ + aynX%, = 1, 


i.e. that the ideal generated by a1, ..., a, be the whole ring R. If n = 2, this necessary 
condition is also sufficient, by the same observation as when invertibility of matrices 
was first considered for R = Z. However, if n > 2 there exist even factorial domains 
R for which the condition is not sufficient. In 1976 Quillen and Suslin independently 
proved the twenty-year-old conjecture that it is sufficient if R = K[t,..., tg] is the 
ring of polynomials in finitely many indeterminates with coefficients from an arbitrary 
field K. 

By pursuing an analogy between projective modules in algebra and vector bundles 
in topology, Serre (1955) had been led to conjecture that, for R = K[t,..., ta], if an 
R-module has a finite basis and is the direct sum of two submodules, then each of these 
submodules has a finite basis. Seshadri (1958) proved the conjecture for d = 2 and in 
the same year Serre showed that, for arbitrary d, it would follow from the result which 
Quillen and Suslin subsequently proved. 
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For proofs of these results and for later developments, see Lam [29], Fitchas and 
Galligo [14], and Swan [50]. There is a short proof of the Quillen—Suslin theorem in 
Lang [31]. 

For Smith’s normal form, see Smith [49] and Kaplansky [27]. It was shown by 
Wedderburn (1915) that Smith’s normal form also holds for matrices of holomor- 
phic functions, even though the latter do not form a principal ideal domain; see 
Narasimhan [37]. 

Finitely generated commutative groups are important, not only because more can 
be said about them, but also because they arise in practice. Dirichlet’s unit theorem 
says that the units of an algebraic number field form a finitely generated commutative 
group. As will be seen in Chapter XIII, 84, Mordell’s theorem says that the rational 
points of an elliptic curve also form a finitely generated commutative group. 

Modules over a polynomial ring K[s] play an important role in what electrical 
engineers call linear systems theory. Connected accounts are given in Kalman [26], 
Rosenbrock [44] and Kailath [25]. For some further mathematical developments, see 
Forney [15], Coppel [11], and Coppel and Cullen [12]. 
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IV 


Continued Fractions and Their Uses 


1 The Continued Fraction Algorithm 


Let € = & be an irrational real number. Then we can write 


co = a0 +6, "5 
where ap = |¢o] is the greatest integer < ¢o and where ¢ > 1 is again an irrational 
number. Hence the process can be repeated indefinitely: 
&=at+G', (=e), > I), 
S=m+&"', (a2 =16),6 >), 


By construction, a, € Z for alln > O anda, > 1ifn > 1. The uniquely determined 
infinite sequence [dao, a1, a2, ...] is called the continued fraction expansion of ¢€. The 
continued fraction expansion of ¢, is [dn, Qn+1, Qn42,---]. 

For example, the ‘golden ratio? s = (1 + V5)/2 has the continued fraction 
expansion [1, 1, 1,...], since tr = 1+ _ Similarly, J/2 has the continued fraction 
expansion [1, 2, 2,...], since /2+1=2+1/(/2+ 1). 

The relation between ¢, and ¢,41 may be written as a linear fractional transforma- 
tion: 


Cn = (Anéngi + 1)/Cen4i + 0). 
An arbitrary linear fractional transformation 
¢ = (ad! + B)/(ye" +0) 


is completely determined by its matrix 


r(0 9) 


This description is convenient, because if we make a further linear fractional transfor- 
mation 


C= (a'E" + BY/(y'E" + 8) 
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with matrix 


of the composite transformation 
é — (ale + BYY/(y hee + a’) 


is given by the matrix product T” = TT’. 
It follows that, if we set 
_ (a 1 
oa (‘ ale 


then the matrix of the linear fractional transformation which expresses ¢ in terms of 
Cn+1 1S 


It is readily verified by induction that 
fie Ce re) 
Gn n-1 
1.€., 
& = (PnSn+1 + Pn-1)/(GnEnt+1 + Yn-1), 
where the elements py, gy Satisfy the recurrence relations 
Pn = 4nPn-1 + Pn-2, In = AnGn—-1 + Gn-2 (n = O), (1) 
with the conventional starting values 
p-2=90, p-1=1, resp.g2=1, g-1=0. (2) 
In particular, 
Po=4, pri=aiaot+l, go=l, qi=a. 
Since det Ay = —1, by taking determinants we obtain 
Prdn—1 ~ Pn-19n = (-1)"*" (n> 0). (3) 


By (1) also, 
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Pn Pn-1\ 1 1 = Pn-2 Pn 
Gdn Qn-1 —an O Qn—2. Qn)’ 
from which, by taking determinants again, we get 


Pndn-2 — Pn-29n = (—1)"ay (n = 0). (4) 
It follows from (1)-(2) that p, and q, are integers, and from (3) that they are 
coprime. Since a, > 1 forn > 1, we have 
l=q<q <@<-::-. 


Thus g, > n forn > 1. (In fact, since gn > dn—1 + n—2 for n > 1, it is readily shown 
by induction that g, > t”~! forn > 1, where t = (1 + /5)/2.) 
Since g, > 0 forn > 0, we can rewrite (3), (4) in the forms 


Pn/Qn — Pn—-1/Qn-1 = (-1)"*" /Gn—190 (n > 1), (3)’ 
Pn/GQn — Pn-2/Qn-2 = (—1)"an/Gn—24n (n > 2). (4)’ 
It follows that the sequence {p2,/q2n} is increasing, the sequence {p2n+1/g2n+1} is 
decreasing, and every member of the first sequence is less than every member of the 
second sequence. Hence both sequences have limits and actually, since gy — oo, the 


limits of the two sequences are the same. 
From 


c= (Pnén+1 oF Pn-1)/(GnSn+1 + n-1) 


we obtain 


Gn& — Pn = (Pn—14n — PnGn—1)/(QnEn41 + Qn—1) = (—1)"/(GnEn41 + Qn-1)- 


Hence € > pn/qn ifnis evenandé < py/dy ifn is odd. It follows that pp/qn > € as 
n — oo. Consequently different irrational numbers have different continued fraction 
expansions. 

Since € lies between pn/dn and pn+1/qn+1, we have 


|Pnt2/Qn+2 — Pn/Qnl < |¢ — Pn/dnl < |Pn4i/dn+t — Pn/al- 


By (3)’ and (4) we can rewrite this in the form 


An+2/Gn9n4+2 <\€ — Pn/dn\| < 1/dndn+1 (n > O). (5) 


Hence 


4 =f 
Gnv2 < dno — Pal < aay 


which shows that |g,n¢é — pn| decreases as n increases. It follows that | — py/qn| also 
decreases as 7 increases. 

The rational number py/gn is called the n-th convergent of €. The integers ay 
are called the partial quotients and the real numbers ¢,, the complete quotients in the 
continued fraction expansion of €. 

The continued fraction algorithm can be applied also when ¢ = o is rational, but 
in this case it is really the same as the Euclidean algorithm. For suppose ¢; = bn/Cn, 
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where b,, and c, are integers and c, > 0. We can write 
bn = AnCn + Cn+15 


where dy = |¢,| and cy+1 is an integer such that 0 < cy41 < Cy. Thus ¢,+1 is defined 
if and only if c»41 ~ O, and then €)41 = Cp/Cn+1. Since the sequence of positive 
integers {c,} cannot decrease for ever, the continued fraction algorithm for a rational 
number ¢ always terminates. At the last stage of the algorithm we have simply 


en =4n, 


where ay > 1if N > 0. The uniquely determined finite sequence [ao, a1,..., an] 18 
called the continued fraction expansion of ¢. 

Convergents and complete quotients can be defined as before; all the properties 
derived for € irrational continue to hold for € rational, provided we do not go past 
n = N. The relation 


& = (py-1énw + pn-2)/(an-1En + 9n-2) 


now shows that 


¢ = pn/qn- 


Consequently different rational numbers have different continued fraction expansions. 
Now let ao, a), a2, ... be any infinite sequence of integers with a, > 1 forn > 1. 
If we define integers py, qn by the recurrence relations (1)-(2), our previous argu- 
ment shows that the sequence {p27 /g2n} is increasing, the sequence {P2n+1/q2n+1} is 
decreasing, and the two sequences have a common limit €. If we put é& = & and 


Cnt = —(Gn—-1€ — Pn-1)/(Gné — Pn) (n= 0), 


our previous argument shows also that ¢,4; > 1 (n > 0). Since 


-1 
cn = An eke 


it follows that a, = |¢,|. Hence € is irrational and [ao, a1, a2, ...] is its continued 
fraction expansion. 

Similarly it may be seen that, for any finite sequence of integers ao, a1,..., aN, 
with a, > 1 for! <n < Nanday > lif N > O, there is a rational number € with 
[a0, 41, ..., @y] as its continued fraction expansion. 

We will write simply € = [ao, a1,..., an] if € is rational and € = [ao, a1, a2, ...] 
if € is irrational. 

We will later have need of the following result: 


Lemma 0 Let € be an irrational number with complete quotients €, and convergents 
Pn/Qn- Lf y is any irrational number different from ¢€, and if we define ny+1 by 


y= (PaNn+1 + Pn—1)/(4n4n4+1 an dn-1)s 


then —1 < my, <0 for all large n. 
Moreover, if€ > 1. andy <0, then—1 < my, < Oforalln > 0. 
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Proof We have 


An+1 = (Gn—1N — Pn—1)/(Pn — Fn0)- 


Hence 


On+1 = QnMn+1 + In-1 
= (Pngn-1 — Pn—19n)/(Pn — Qn) 
= (-1)"*"/(n — nt) 
= (-1)"/an(4 — Pn/4n)- 


Since pn/gn > € #7 and gn > ov, it follows that 6, — 0. Since 


n+l = —(Gn-1 — On+1)/Gn> 


we conclude that —1 < y+) < 0 for all large n. 

Suppose now that € > 1 andy < 0. It is readily verified that 7, = an + 1/441. 
But a, = [é,| => 1 for alln > 0. Consequently 4, < 0 implies 1/441 < —1 and 
thus —1 < 4,41 < 0. Since yo < 0, it follows by induction that —1 < 4, < 0 for all 
n> 0. 


The complete quotients of a real number may be characterized in the following 
way: 


Proposition 1 [f7 > 1 and 
€ = (pnt p')/(qn+q’), 


where p,q, p’,q' are integers such that pq’ — p'q = +1 andq > q' > 0, thenn isa 
complete quotient of € and p'/q', p/q are corresponding consecutive convergents of €. 


Proof The relation pg’ — p’q = +1 implies that p and q are relatively prime. Since 
q > 0, p/q has a finite continued fraction expansion 


P/q = (a0, 41, ---,4n—1] = Pn-1/n-1 


and g = qn-1, P = Pn—1. In fact, since g > 1, we haven > 1, ay; > 2 and 
Qn-1 > Gn—2. From 


Pn-\9n—2 — Pn-24n-1 = =" = é(pq’ = Pq); 


where ¢ = +1, we obtain 


Pn—1(Gn—2 = éq’) = Qn—1(Pn—2 = ep’). 


Hence gn—1 divides gn—2 — eq’. Since 0 < qn—2 < Gn—1 and 0 <q’ < qn-1, it follows 
that q’ = qn—2 if ¢ = 1 and q’ = qn-1 — qn-2 if ¢ = —1. Hence p’ = py_2 if e = 1 
and p’ = Pn-1 — Pn-2 if ¢ = —1. Thus 


€ = (pn IN + Pn 2)/(Gn 1N+4n 2), 
resp. (Pn—19 + Pn—1 — Pn—2)/(Qn—19 + Qn—1 — Gn-2)- 
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Since 7 > 1, its continued fraction expansion has the form [dy, ay41,...], where 
ay, > 1. It follows that ¢ has the continued fraction expansion 


[a0, a, +++94n—-1,4n, ccsall resp. [a0, a1, »+54n-1 = 1, 1, an, vale 


In either case p’/q’ and p/q are consecutive convergents of & and 7 is the correspond- 
ing complete quotient. 


A complex number ¢ is said to be equivalent to a complex number o if there exist 
integers a, b,c, d with ad — bc = +1 such that 


¢ = (aw +b)/(co +d), 


and properly equivalent if actually ad — bc = 1. Then @ is also equivalent, resp. 
properly equivalent, to ¢, since 


w = (dé —b)/(-—cf +a). 


By taking a = d = 1 and b = c = O, we see that any complex number ¢ is 
properly equivalent to itself. It is not difficult to verify also that if ¢ is equivalent to w 
and @ equivalent to y, then ¢ is equivalent to y, and the same holds with ‘equivalence’ 
replaced by “proper equivalence’. Thus equivalence and proper equivalence are indeed 
“equivalence relations’. 

For any coprime integers b, d, there exist integers a, c such that ad —bc = 1. Since 


b/d =(a-0+b)/(c-0 +4), 


it follows that any rational number is properly equivalent to 0, and hence any two 
rational numbers are properly equivalent. The situation is more interesting for irra- 
tional numbers: 


Proposition 2 Two irrational numbers €, are equivalent if and only if their con- 
tinued fraction expansions [ao, a\, a2, ...], [bo, b1, b2, ...] have the same ‘tails’, i.e. 
there exist integers m > 0 andn > 0 such that 


Am+k = bn+e forallk > 0. 


Proof If the continued fraction expansions of ¢ and 7 have the same tails, then 
some complete quotient ¢, of € coincides with some complete quotient 7, of 7. 
But € is equivalent to ¢m, since € = (pm—1ém + Pm—2)/(Gm—1¢m + Gm—2) and 
Pm—19m—2 — Pm—29m—1 = (—1)”, and similarly 7 is equivalent to 7,. Hence € and y 
are equivalent. 

Suppose on the other hand that ¢ and 7 are equivalent. Then 


n = (ag + b)/(co + d) 


for some integers a, b, c,d such that ad — bc = +1. By changing the signs of all four 
we may suppose that c¢ + d > 0. From the relation 


—— (Pn—16n + Pn—2)/(Qn—1¢n + qn-2) 
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between ¢ and its complete quotient ¢,, it follows that 
0 = (anon + bn)/(Cnsn + an), 
where 


ayn = APn-\ + bqn-1, by = apn—2 + bqn-2, 
Cn = CPn-1 + dqn-1, dn = CPn—2 + dqn-2; 


and hence 


andy — bacn = (ad _ be)(pn 19n—2 — Pn—-24n i) Seel. 


The inequalities 
lgn—1¢ — Pn-1l < 1/@n, — |gn—26 — Pn—2| < 1/qn-1 
imply that 
len — (C6 + )qn—1| < lel/dn, dn — (€€ + d)Gn—2| < lel/Gn—-1. 


Since c€ +d > 0, gn—1 > dn—2 and qn CO asn — ov, it follows that c, > d, > 0 
for sufficiently large n. Then, by Proposition 1, ¢, is a complete quotient also of 7. 
Thus the continued fraction expansions of ¢ and 7 have a common tail. 


2 Diophantine Approximation 


The subject of Diophantine approximation is concerned with finding integer or 
rational solutions for systems of inequalities. For problems in one dimension the 
continued fraction algorithm is a most helpful tool, as we will now see. 


Proposition 3 Let py/gn(n > 1) be a convergent of the real number €. If p,q are 
integers such thatO <q < qn and p Py if ¢ = Gn, then 


las — pl 2 ldn—-1¢ — Pn-1| > lan — Pal 

and 
If — p/q| > |6 — Pn/anl- 

Proof It follows from (3) that the simultaneous linear equations 

APn-1 + LPn = Ps AGn-1 + Hn = 4; 
have integer solutions, namely 

A= (HV "(png = anP)s = (-1)" (P14 — Gn-1P). 

The hypotheses on p, g imply that 2 ¥ 0. If w = 0, then 


lag — p| = |A(@n-1¢ — Pn-1)| = l@n-1¢ — Pn-il- 
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Thus we now assume “ 4 0. Since g < gn, A and yw cannot both be positive and hence, 
since g > 0,Au < 0. Then 
q¢ — p =4(qn-1€ — Pn-1) + L(Gn€ — Pn) 


and both terms on the right have the same sign. Hence 


lac — pl = |An-1¢ — Pn-1)| + | (Gn — Pn)| 
> |dn-1¢ — Pn-il- 
This proves the first statement of the proposition. The second statement follows, 
since 
If — p/gl= 47 "lag — pl > aang = Pl 
= (qn/Q)I€ — Pn/ dnl 
2 Io — Pn/4nl- 


To illustrate the application of Proposition 3, consider the continued fraction ex- 
pansion of z = 3.14159265358 .... We easily find that it begins [3, 7, 15, 1, 292, ...]. 
It follows that the first five convergents of z are 


3/1, 22/7, 333/106, 355/113, 103993/33102. 


Using the inequality |€ — pn/dn| < 1/dngn+1 and choosing n = 3 so that a,+1 is 
large, we obtain 


0 < 355/113 — a < 0.000000267---. 


The approximation 355/113 to z was first given by the Chinese mathematician 
Zu Chongzhi in the 5th century A.D. Proposition 3 shows that it is a better approx- 
imation to z than any other rational number with denominator < 113. 

In general, a rational number p’/q’, where p’, q’ are integers and g’ > 0, may be 
said to be a best approximation to a real number ¢ if 


If — p/q\| > |€ — p'/q'| 


for all different rational numbers p/g whose denominator q satisfies 0 < q < q’. Thus 
Proposition 3 says that any convergent py/qn (n > 1) of ¢ is a best approximation 
of ¢. However, these are not the only best approximations. It may be shown that, if 
Pn—2/Gn—2 and Pn—1/Gn—1 are consecutive convergents of ¢, then any rational number 
of the form 


(cPn 1+ Pn 2)/(CGn 1+4n 2), 


where c is an integer such that a,/2 < c < dy is a best approximation of ¢. Further- 
more, every best approximation of € has this form if, when a, is even, one allows also 
c=a,/2. 

It follows that 355/113 is a better approximation to z than any other rational num- 
ber with denominator less than 16604, since 292/2 = 146 and 146 x 113 + 106 = 
16604. 
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The complete continued fraction expansion of z is not known. However, it was 
discovered by Cotes (1714) and then proved by Euler (1737) that the complete 
continued fraction expansion of e = 2.71828182459... is given by e —1 = 
[1, 1,2, 1, 1,4, 1,1,6,...]. 

The preceding results may also be applied to the construction of calendars. The 
solar year has a length of about 365.24219 mean solar days. The continued fraction 
expansion of 2 = (0.24219)-! begins [4, 7, 1, 3, 24, ...]. Hence the first five conver- 
gents of / are 


4/1, 29/7, 33/8, 128/31, 3105/752. 
It follows that 
0 < 128/31 —1 < 0.0000428 


and 128/31 is a better approximation to 2 than any other rational number with denom- 
inator less than 380. The Julian calendar, by adding a day every 4 years, estimated the 
year at 365.25 days. The Gregorian calendar, by adding 97 days every 400 years, esti- 
mates the year at 365.2425 days. Our analysis shows that, if we added instead 31 days 
every 128 years, we would obtain the much more precise estimate of 365.2421875 
days. 

Best approximations also find an application in the selection of gear ratios, and con- 
tinued fractions were already used for this purpose by Huygens (1682) in constructing 
his planetarium (a mechanical model for the solar system). 

The next proposition describes another way in which the continued fraction expan- 
sion provides good rational approximations. 


Proposition 4 [f p,q are coprime integers with q > 0 such that, for some real num- 
ber é, 


If — p/q| < 1/297, 
then p/q is a convergent of ¢. 
Proof Let pn/dn be the convergents of ¢ and assume that p/q is not a convergent. We 
show first that g < gy forsome N > 0. This is obvious if ¢ is irrational. If € = py /qn 
is rational, then 
1/qn S lapn — panl/qn = |4¢ — P| < 1/24. 
Hence g < gn and N > 0. 
It follows that gn—1 < gq < gn for some n > 0. By Proposition 3, 
ldn—1¢ — Pn—-i| < lgé — p| < 1/2¢. 


Hence 


1/qqn-1 S \9Pn—1 — Pan-1|/94n-1 
= |Pn-1/9n-1 — P/| 
< |Pn-1/dn-1 — €| + 1€ — p/a 
< 1/2qqn—1 + 1/2q?. 


But this implies g < gn—1, which is a contradiction. 


188 IV Continued Fractions and Their Uses 


As an application of Proposition 4 we prove 


Proposition 5 Let d be a positive integer which is not a square and m an integer such 
that 0 < m* < d. If x, y are positive integers such that 


= dy” =m, 
then x /y is a convergent of the irrational number Vd. 
Proof Suppose first that m > 0. Then x/y > Vd and 
0<x/y—Vd=m/(xy + y’?Vd) < Vd/2y?Vd = 1/2y". 
Hence x/y is a convergent of ./d, by Proposition 4. 
Suppose next that m < 0. Then y/x > 1/./d and 
0 < y/x —1/Vd = —m/d(xy + x?/Vd) < 1/Vd(xy +. x7/Vd) < 1/2x?. 


Hence y/x is a convergent of 1/./d. But, since 1/./d = 0 + 1/4, the convergents 
of 1/./d are 0/1 and the reciprocals of the convergents of Jd. 


In the next section we will show that the continued fraction expansion of /d has 
a particularly simple form. 

It was shown by Vahlen (1895) that at least one of any two consecutive convergents 
of € satisfies the inequality of Proposition 4. Indeed, since consecutive convergents lie 
on opposite sides of ¢, 

|Pn/Gn — €| + | Pn—1/9n-1 — €| = |Pn/dn — Pn—-1/Gn-1\ 
= 1/qndun—1 < 1/2qq + 1/2qn-15 


with equality only if g, = gn—1. This proves the assertion, except when n = | and 
41 = Go = 1. But in this case a} = 1,1 < ¢; < 2 and hence 


lé — pi/qil=|é —ao - 1] =1-&' < 1/2. 


It was shown by Borel (1903) that at least one of any three consecutive convergents 
of ¢ satisfies the sharper inequality 


IE — p/q| < 1/V5q?. 


In fact this is obtained by taking r = | in the following more general result, due to 
Forder (1963) and Wright (1964). 


Proposition 6 Let € be an irrational number with the continued fraction expansion 
[ao, a1, ...] and the convergents Pn/qn. If, for some positive integer r, 


IE = pn/gnl = 1/0? +4)'?q? forn =m —1,m,m +1, 
then dm41 <r. 


Proof If we put s = (r* + 4)!/?/2, then s is irrational. For otherwise 2s would be an 
integer and from (2s + r)(2s —r) = 4 we would obtain 2s +r = 4,2s —r = 1 and 
hence r = 3/2, which is a contradiction. 

By the hypotheses of the proposition, 


1/qm—19m — |Pm—1/Gm-1 _ Pim [Qm| = Ie = Pm—-1/4m-1\ ae Ie = Pm/4m|\ 
2 = 
> Gn + Im )/2s 
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and hence 
dn — 2sqm—1dm + go -4 < 0. 


Furthermore, this inequality also holds when gm—1, gm are replaced by gm, Gm+1. Con- 
sequently gm—1/dm and gm+1/qm both satisfy the inequality t? — 2st + 1 < 0. Since 


?—-Ast+1l= (t—s+r/2)(t —s —r/2), 
it follows that 
s—r/2 < qm-1/dm < dm+i/dn <5 +17/2, 


the first and last inequalities being strict because s is irrational. Hence 


Am+1 = Am+i1/9m — Im-1/Gm < 8 +r/2— (s —r/2)y=r. 


It follows from Proposition 6 with r = 1 that, for any irrational number é, there 
exist infinitely many rational numbers p/g = Pn/qn Such that 


IE — p/q\ < 1/VSq?. 


Here the constant ./5 is best possible. For take any c > /5. If there exists a rational 
number p/q, with g > 0 and (p, g) = 1, such that 


If — p/al < 1/cq", 
then p/q is aconvergent of ¢, by Proposition 4. But for any convergent p,/qn we have 
IS — Pn/Qn| = 1/4n(Gnsn+1 + Gn-1). 
If we take € =7t:= (1+ /5)/2, then also ¢,41 = Tt and pp = gn+1. Hence 
It = dn41/nl = 1/an(t + Gn—1/4n), 


where tT + Gn—-1/Gn > TH+ tis 155, since Gn/dn—1 — tT. Thus, for any c > 5, 
there exist at most finitely many rational numbers p/q such that 


tz — p/ql < 1/eq?. 
It follows from Proposition 6 with r = 2 that if 


If — pn/qn| > 1/V8q? for all large n, 
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then a, = 1 for all large n. The constant \/8 is again best possible, since a similar 
argument to that just given shows that if o := 1+ /2 = [2,2,...] then, for any 
c > 8, there exist at most finitely many rational numbers p/g such that 


lo — p/q\ < 1/cq?. 
It follows from Proposition 6 with r = 3 that if 
I€ — pn/qn| = 1/139? for all large n, 


then a, € {1, 2} for all large n. 
For any irrational ¢, with continued fraction expansion [do, a), ...] and conver- 


gents Pyn/qn, put 
Tr. ,-l —1 
M(é) = (im. Qn lan€ —~ Pn . 
It follows from Proposition 2 that M(¢) = M(n) if € and 7 are equivalent. The results 
just established show that M(é) > V5 for every €. If M(é) < V8, then a, = | for all 
large n; hence € is equivalent to ct and M(€) = M(t) = af 5: If M(é) < 13, then 
ay € {1, 2} for all large n. 
An irrational number ¢€ is said to be badly approximable if M(é) < oo. The 
inequalities 
Qn42/Gn9n+2 < 16 — Pn/dnl < 1/dndn4i 
imply 
an+1 < Qn+1/dn < da ‘lané = pal 


and 


a lank = pl < Gn+2/An+24n < dn+1/4n +1< an+1 + Dy 


Hence ¢ is badly approximable if and only if its partial quotients ay, are bounded. 
It is obvious that ¢ is badly approximable if there exists a constant c > 0 such that 


lé — p/q| > ¢/@’ 


for every rational number p/q. Conversely, if € is badly approximable, then there 
exists such a constant c > 0. This is clear when p and q are coprime integers, since if 
p/q is not a convergent of ¢ then, by Proposition 4, 


If — p/q| 2 1/29”. 
On the other hand, if p = Ap’, g = 4q’, where p’, q’ are coprime, then 
If — p/ql =1E — p'/q'| 2 /q” = We/q? > e/g’. 


Some of the applications of badly approximable numbers stem from the following 
characterization: a real number @ is badly approximable if and only if there exists a 
constant c’ > 0 such that 


\e-779 _1|>c'/q forallg EN. 
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To establish this, put g@ = p + 0, where p € Z and |o| < 1/2. Then 


\e°7!49 _ 1| = 2|sinzg6| = 2| sinzo| 
and the result follows from the previous characterization, since (sinx)/x decreases 
from | to 2/z as x increases from 0 to z/2. 
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A complex number ¢ is said to be a quadratic irrational if it is a root of a monic 
quadratic polynomial 7 + rt +. with rational coefficients r, s, but is not itself rational. 
Since ¢ ¢ Q, the rational numbers 7, s are uniquely determined by ¢. 

Equivalently, ¢ is a quadratic irrational if it is a root of a quadratic polynomial 


f(t) = At? + Br+cC 


with integer coefficients A, B, C such that B* — 4AC is not the square of an integer. 
The integers A, B, C are uniquely determined up to a common factor and are uniquely 
determined up to sign if we require that they have greatest common divisor 1. The 
corresponding integer D = B? — 4AC is then uniquely determined and is called the 
discriminant of ¢. A quadratic irrational is real if and only if its discriminant is positive. 

It is readily verified that if a quadratic irrational ¢ is equivalent to a complex num- 
ber a, 1.e. if 


¢ = (aw + £)/(yo + 0), 


where a, f,y,0 € Zand ad — fy = +1, then a is also a quadratic irrational. More- 
over, if ¢ is a root of the quadratic polynomial f(t) = At?+Bt+C, where A, B, C are 
integers with greatest common divisor I, then @ is a root of the quadratic polynomial 


g(t) = Alt? + Bt +C’, 


where 
Po 2 2 
A =a A+ayB+y°C, 
B’ = 2aBA + (ad 4+ By)B+2y6C, 
C' = p°A+ BOB + &C, 
and hence 
B? —4A'C' = B* —4AC =D. 
Since 


A=0'A'—ydB’ +y°C, 
B =—2B6A' + (ad + By)B' —2ayC’, 
C = fp’ A' —aBB' +0°C', 


A’, B’, C’ also have greatest common divisor 1. 
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If ¢ is a quadratic irrational, we define the conjugate ¢’ of ¢ to be the other root of 
the quadratic polynomial f(t) which has ¢ as a root. If 


¢ = (aw + £)/(yo + 4), 


where a, 6, y,d € Zand ao — fy = +1, then evidently 
("= (ao! + B)/(yo!' + 9). 


Suppose now that ¢ = € is real and that the integers A, B,C are uniquely 
determined by requiring not only (A, B, C) = 1 but also A > 0. The real quadratic 
irrational & is said to be reduced if € > 1 and —1 < é’ < 0. If € is reduced then, since 
& > &', we must have 


€=(-B+¥VD)/2A,  & =(—B-/VD)/2A. 
Thus the inequalities € > 1 and —-1 < & < 0 imply 
0<VD+B<2A<VD-B. 


Conversely, if the coefficients A, B, C of f(t) satisfy these inequalities, where D = 
B?—4AC > 0, then one of the roots of f(t) is reduced. For B < 0 < A and so the 
roots €, &’ of f(t) have opposite signs. If € is the positive root, then € and €’ are given 
by the preceding formulas and hence € > 1, —1 < &’ < 0. It should be noted also that 
if € is reduced, then B? < D and hence C < 0. 

We return now to continued fractions. If ¢ is a real quadratic irrational, then its 
complete quotients ¢, are all quadratic irrationals and, conversely, if some complete 
quotient ¢,, is a quadratic irrational, then ¢ is also a quadratic irrational. 

The continued fraction expansion [ao, a), a2, ...] of a real number ¢ is said to be 
eventually periodic if there exist integers m > 0 andh > 0 such that 


Qn =d4n+h_ foralln > m. 
The continued fraction expansion is then conveniently denoted by 


[a0, @1,.-.,Am—1, 4m, ---, Am+h—1]. 


The continued fraction expansion is said to be periodic if it is eventually periodic with 
m=0. 

Equivalently, the continued fraction expansion of ¢ is eventually periodic if ¢, = 
ém+h for some m > 0 andh > 0, and periodic if this holds with m = 0. The period of 
the continued fraction expansion, in either case, is the least positive integer with this 
property. 

We are going to show that there is a close connection between real quadratic irra- 
tionals and eventually periodic continued fractions. 


Proposition 7 A real number € is a reduced quadratic irrational if and only if its 
continued fraction expansion is periodic. 
Moreover, if € = [ao, ..., Gn—1], then —1/&' = [Gn—t, .-., ao]. 
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Proof Suppose first that € = [ao,..., Gy—1] has a periodic continued fraction expan- 
sion. Then a9 = ay, > 1 and hence € > 1. Furthermore, since 


© = (pn—16h + Ph—2)/(Gn—16n + Gn-2) 
and ¢, = €, € is an irrational root of the quadratic polynomial 
FO) = qn-1t? + (Gn—2 — Ph-1)t — Pn—2- 
Thus € is a quadratic irrational. Since f(0) = —pp—2 < 0 and 


f(-1) = qn-1 — Qn-2 + Ph-1 — pa—2 > 9 


(even for h = 1), it follows that —1 < & < 0. Thus € is reduced. 

If € is a reduced quadratic irrational, then its complete quotients ¢,, which are 
all quadratic irrationals, are also reduced, by Lemma 0 with 4 = €’. Since & = 
Qn + TF ole and —1 < &’ < 0, we have 


an = |-1/Sh411- 
Thus ¢,, & are the roots of a uniquely determined polynomial 
fn(t) = Aye? + Bat + Ch, 


where A,, Bn, Cy are integers with greatest common divisor | and A, > 0. Further- 
more, D = ee —4A,Cy is independent of n and positive. Since ¢, is reduced, we have 


&, = (—Ba +VD)/2An, & = (—Bn —VD)/2An, 
where 
C<eVD+B. <2An ee D= Be. 


If we put g = |4/D|, then —B, €{1,..., g} and, for a given value of By, there are at 
most —B, possible values for A,. Consequently the number of distinct pairs Ay, By 
does not exceed 1 + ---+ g = g(g + 1)/2. Hence we must have 


G=& G=g 


for some j,k such thatO < j <k < g(g+1)/2. If j =0, this already proves that the 
continued fraction expansion of € is periodic. If 7 > 0, then 


aj-1 = ([-1/E5) = L-1/G) = ani 
and hence 
Cj-1 = aj-1 + 1 / oj = ag—-1 + 1/Ce = Se-1. 


Repeating this argument j times, we obtain ¢) = ¢x—;. Thus ¢ has a periodic continued 
fraction expansion in any case. 

If the period is h, so that € = [ao,...,ay—1], then & = & and the relation 
dn = |-1/€,,] implies that —1/é’ = [aj—t,-.5 ao). 
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The proof of Proposition 7 shows that the period is at most g(g + 1)/2 and thus is 
certainly less than D. By counting the pairs of integers A, B for which not only 


0</D+B <2A <VJ/D-B, 


but also D = B? mod4A, it may be shown that the period is at most O(/D log D). 
(The Landau order symbol used here is defined under ‘Notations’ .) 


Proposition 8 A real number € is a quadratic irrational if and only if its continued 
fraction expansion is eventually periodic. 


Proof Suppose first that the continued fraction expansion of ¢ is eventually peri- 
odic. Then some complete quotient ¢,, has a periodic continued fraction expansion 
and hence is a quadratic irrational, by Proposition 7. But this implies that ¢ also is a 
quadratic irrational. 

Suppose next that ¢ is a quadratic irrational. We will prove that the continued 
fraction expansion of ¢ is eventually periodic by showing that some complete quo- 
tient ¢,41 is reduced. Since we certainly have ¢,4; > 1, we need only show that 
-l< Coa < 0. But ¢’ 4 € and ¢’ = (Pngpat oF Pn—1)/(QnSp44 + qn—1). Hence, by 
Lemma 0, —1 < Ce < 0 for all large n. 


It follows from Proposition 8 that any real quadratic irrational is badly approx- 
imable, since its partial quotients are bounded. It follows from Propositions 7 and 8 
that there are only finitely many inequivalent quadratic irrationals with a given dis- 
criminant D > 0, since any real quadratic irrational is equivalent to a reduced one and 
only finitely many pairs of integers A, B satisfy the inequalities 


0<V/D+B<2A<VD-B. 


Proposition 8 is due to Euler and Lagrange. It was first shown by Euler (1737) that 
a real number is a quadratic irrational if its continued fraction expansion is eventually 
periodic, and the converse was proved by Lagrange (1770). Proposition 7 was first 
stated and proved by Galois (1829), although it was implicit in the work of Lagrange 
(1773) on the reduction of binary quadratic forms. Proposition 7 provides a simple 
proof of the following result due to Legendre: 


Proposition 9 For any real number €, the following two conditions are equivalent: 


(i) € > 1, € is irrational and é? is rational; 
(ii) the continued fraction expansion of € has the form (ao, a1, ..., 4h], where an = 
2ao and aj = an-i fori =1,...,h —1. 


Proof Suppose first that (i) holds. Then ¢ is a quadratic irrational, since it is a root of 
the polynomial 7 — <7. The continued fraction expansion of ¢ cannot be periodic, by 
Proposition 7, since €’ = —é < —1. However, the continued fraction expansion of ¢ 
is periodic, since €| > 1 and 1/¢; = ¢’ — ao < —1. Thus & = [@%,..., a] for some 
h > 1. By Proposition 7 also, 


—1/é, = (a, ---, 411. 
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But 
—1/& =€+a9 = [2a0,G,..., Hl. 


Comparing this with the previous expression, we see that (11) holds. 


195 


Suppose, conversely, that (ii) holds. Then ¢ is irrational, ag > 0 and hence € > 1. 


Moreover ¢| = [a@],..., Gy] is a reduced quadratic irrational and 
-1/é; = [an,..-, a] = [2a9, @,-.-, an] =ao+é. 
Hence ¢’ = ag + 1/¢; = —€ and é* = —€€' is rational. 


4 Quadratic Diophantine Equations 


We are interested in finding all integers x, y such that 

ax” + bxy +cy? +dx +ey+ f =0, 
where a,..., f are given integers. Writing (6) as a quadratic equation for x, 

ax” + (by +d)x +cy* +ey + f =0, 
we see that if a solution exists for some y, then the discriminant 

(by + dy’ — 4a(cy” +ey + f) 
must be a perfect square. Thus 
(b? — 4ac) y” + 2(bd — 2ae)y + d* — 4af = 2’ 
for some integer z. If we put 
p= b* — 4ac, q:=bd—-2ae, r= a= 4af, 
we have a quadratic equation for y, 
py’ +2qy+r—z? =0, 

whose discriminant must also be a perfect square. Thus 


ge -pir-2)=w 


for some integer w. Thus if (6) has a solution in integers, so also does the equation 


w* — pz* = q* — pr. 


(6) 


Moreover, from all solutions in integers of the latter equation we may obtain, by 


retracing our steps, all solutions in integers of the original equation (6). 
Thus we now restrict our attention to finding all integers x, y such that 


7 = dy” =m, 


where d and m are given integers. 


(7) 


The equation (7) has the remarkable property, which was known to Brahmagupta 
(628) and later rediscovered by Euler (1758), that if we have solutions for two values 
mj, mz of m, then we can derive a solution for their product m mz. This follows from 


the identity 
(xf — dy{)(x3 — dyz) = x” —dy’, 
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where 


xX =xix2+dy\y2, y=xX1y2+ y1x2. 


(In fact, Brahmagupta’s identity is just a restatement of the norm relation N(af) = 
N(a)N(f) for elements a, f of a quadratic field.) In particular, from two solutions of 
the equation 


x? —dy? =1, (8) 


a third solution can be obtained by composition in this way. 

Composition of solutions is evidently commutative and associative. In fact the 
solutions of (8) form an abelian group under composition, with the trivial solution 1, 0 
as identity element and the solution x, —y as the inverse of the solution x, y. Also, by 
composing an arbitrary solution x, y of (8) with the trivial solution —1,0 we obtain 
the solution —x, —y. 

Suppose first that d < 0. Evidently (7) is insoluble ifm < 0 andx = y = Ois 
the only solution if m = 0. If m > 0, there are at most finitely many solutions and we 
may find them all by testing, for each non-negative integer y < (—m/d)'/?, whether 
m + dy? is a perfect square. 

Suppose now that d > 0. If d = e? is a perfect square, then (7) is equivalent to the 
finite set of simultaneous linear Diophantine equations 


/ ” 
X-ey=m, x+ey=m, 


where m’, m” are any integers such that m’m” = m. Thus we now suppose also that d 
is not a perfect square. Then € = Vd is irrational. 

If 0 < m? < d then, by Proposition 5, any positive solution x, y of (7) has the form 
X = Pn, Y = Gn, Where py /qn is a convergent of ¢. In particular, all positive solutions 
of x* — dy? = +1 are obtained in this way. 

On the other hand, as we now show, if py /qn is any convergent of ¢ then 


|p? - dq? | eoy d. 


If n = 0, then Ipe — dqé| — lag —d|, where ay < Vd < ag +1 and so 
0< d—ap < J/d+ay < 2d. Now supposen > 0. Then |pp—qné| < on and hence 


IP; dqr| = |Pn — WnS||Pn — InF + 2Gn6| 
< Gui Gnys + 2ané) < 26. 
An easy congruence argument shows that the equation 
x? —dy?=-1 (9) 


has no solutions in integers unless d = 1 mod4 or d = 2 mod 8. It will now be shown 
that the equation (8), on the other hand, always has solutions in positive integers. 
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Proposition 10 Let d be a positive integer which is not a perfect square. Suppose 
& = Vd has complete quotients é,, convergents py /qn, and continued fraction expan- 
sion [ao9, 41, ..., 4h] of period h. 

Then pe = dq? = +1 ifand only ifn = kh — 1 for some integer k > 0 and in this 
case 


Pin—1 — 4én—1 = (-1)". 
Proof From € = (pnén41 + Pn—1)/(Qnén+1 + Yn—1) We obtain 


(Pn — an€)en4+1 =4n 1g - Pn-\- 


Multiplying by (—1)"*! (pn + gné), we get 


Snént1 =E+ rn, 
where 
Sn = (-1)""!(p2 _ dqn)s Tn = (—1)"(Pa-1 Pn — 4gn—19n). 
Thus s, and r, are integers. Moreover, since ¢,414kh = ¢n+1 and ¢ is irrational, 


Sntkh = Syn and rn+kh = Tn for all positive integers k. 

If p? — dq? = +1, then actually p? — dq? = (—1)"*!, since pyn/qn is less than or 
greater than € according as n is even or odd. Hence s, = 1 and ¢,41 = € +r». Taking 
integral parts, we get dn41 = ao + rn. Consequently 


= = 
mez = Sn+l — Gn+1 =¢-a=¢, . 


Thus ¢,42 = ¢1, which implies that n = kh — 1 for some positive integer k. 
On the other hand, if nm = kh — 1 for some positive integer k, then ¢,42 = ¢) and 
hence 


Cnt — 4n4+1 = ¢ — ao. 


Thus ¢n41 = € + G@n+1 — ao, which implies that s, = 1, since € is irrational. 


It follows from Proposition 10 that, if d is a positive integer which is not a perfect 
square, then the equation (8) always has a solution in positive integers and all such 
solutions are given by 


X= Pkn-1, Y=Qkn—-1 (k=1,2,...) if A is even, 

X = P2kh-1, Y = Q2kh-1 (k => 1,2,...) if h is odd. 
The least solution in positive integers, obtained by taking k = 1, is called the funda- 
mental solution of (8). 


On the other hand, the equation (9) has a solution in positive integers if and only if 
h is odd and all such solutions are then given by 


X=Pknr-1, Y=dn-1 (k=1,3,5,...). 
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The least solution in positive integers, obtained by taking k = 1, is called the funda- 
mental solution of (9). 

To illustrate these results, suppose d = a?+1 for some a € N. Since Vd = [a, 2a], 
the equation x* — dy” = —1 has the fundamental solution x = a, y = 1 and the equa- 
tion x* — dy? = | has the fundamental solution x = 2a” + 1, y = 2a. Again, suppose 
d =a? +a forsomea € N. Since /d = [a, 2, 2a], the equation x —dy? = —1isin- 
soluble, but the equation x*—dy? = 1 has the fundamental solution x = 2a+1, y = 2. 

It is not difficult to obtain upper bounds for the fundamental solutions. Since 
€ = Jd is a root of the polynomial t* — d and since its complete quotients &, are 
reduced for n > 1, they have the form 


Ey = (—Bn + VD)/2An, 


where D = 4d,0 < —B, < JD and A, > 1. Therefore a9 = |€| < /d and 


Gn = |€,| < 2Vd forn > 1. If we puta = | /d], it is easily shown by induction that 
Pn (ata "y"*"/2, ans (ata')" (n> 0). 


These inequalities may now be combined with any upper bound for the period h 
(cf. §3). 

Under composition, the fundamental solution of (8) generates an infinite cyclic 
group @ of solutions of (8). Furthermore, by composing the fundamental solution 
of (9) with any element of @ we obtain infinitely many solutions of (9). We are go- 
ing to show that, by composing also with the trivial solution —1, 0 of (8), all integral 
solutions of (8) and (9) are obtained in this way. This can be proved by means of con- 
tinued fractions, but the following argument due to Nagell (1950) provides additional 
information. 


Proposition 11 Let d be a positive integer which is not a perfect square, let m be a 
positive integer, and let xo, yo be the fundamental solution of the equation (8). 
If the equation 


u>—dv? =m (10) 


has an integral solution, then it actually has one for which u2 < m(xo + 1)/2, 
dv* < m(xp — 1)/2. 
Similarly, if the equation 


u> — dv? = —m (11) 


has an integral solution, then it actually has one for which u2 < m(xo — 1)/2, 
dv? < m(xo + 1)/2. 


Proof By composing a given solution of (10) with any solution in the subgroup @ of 
solutions of (8) which is generated by the solution xo, yo we obtain again a solution 
of (10). Let uo, vo be the solution of (10) obtained in this way for which vg has its least 
non-negative value. Then us =m+ dog also has its least value and by changing the 
sign of ug we may suppose ug > 0. By composing the solution uo, v9 of (10) with the 
inverse of the fundamental solution xo, yo of (8) we obtain the solution 


u=xXQuo —dyov0, v0 = X0v0 — Youo 
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of (10). Since 
u = xouy — dyovo = xouo — [(xp — I) (ug - m)|'/? > 0, 
we must have 
XoUo — dyovo = Uo. 
Hence 
(xo — 1)°u5 > d’ youve — re - (up —m). 
Thus 
(xo — 1)/(ao + 1) = 1 —m/ug, 


which implies uZ < m(xo + 1)/2 and hence dog < m(xo — 1)/2. 
For the equation (11) we begin in the same way. Then from 


(xov0)” = (99 + 1/d) (ug + m) > youd 
we obtain v = xgv0 — youo > O and hence xnv9 — youo > vo. Thus 
d(xg — 1)*v6 > dyqus 
and hence 


(xo — 1)?(up +m) = (x — Luo. 


The argument can now be completed in the same way as before. 


The proof of Proposition 11 shows that if (10), or (11), has an integral solution, 
then we obtain all solutions by finding the finitely many solutions u,v which satisfy 
the inequalities in the statement of Proposition 11 and composing them with all solu- 
tions in @ of (8). 

The only solutions x, y of (8) for which x? < (xo + 1)/2 are the trivial ones x = 
+1, y = 0. Hence any solution of (8) is in @ or is obtained by reversing the signs of a 
solution in @. 

If uw, v is a positive solution of (9) such that u? < (x9 — 1)/2, dv? < (x9 + 1)/2, 
then x = u* + dv*, y = 2uv is a positive solution of (8) such that x < xo. 
Hence (x, y) = (xo, yo) is the fundamental solution of (8) and u* = (xo — 1)/2, 
dv? = (xo + 1)/2. Thus (u, v) is uniquely determined and is the fundamental solution 
of (9). Hence, if (9) has a solution, any solution is obtained by composing the funda- 
mental solution of (9) with an element of @ or by reversing the signs of such a solution. 

A necessary condition for the solubility in integers of the equation (9) is that d 
may be represented as a sum of two squares. For the period / of the continued fraction 
expansion ¢ = Vd = [ao, 41, .--, 4] must be odd, say h = 2m + 1. It follows from 
Proposition 9 that 


€m+1 = [am tae , a1, 2a0, a)... » dm), 
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and then from Proposition 7 that m4; = —1/€) 41: But, by the proof of Proposi- 
tion 10, 


Smém+1 =¢+rn, 
where s,, and 7, are integers. Hence 
—1 = EnpiGng1 = E+ 1m)(-E + 1m)/sm = 1%, — 2/82 


and thus d = a + ae The formulas for s,, and 7, show that, if py /qn are the conver- 
gents of Vd, then d = x* + y with 


xX = Pm-1 Pm — ddm-14m; y= pe, = dqp. 


Unfortunately, the equation (9) may be insoluble, even though d is a sum of two 
squares. As an example, take d = 34 = 57 + 3°. It is easily verified that the funda- 
mental solution of the equation x* — 34y? = 1 is x9 = 35, yo = 6. If the equation 
u2 — 34v2 = —1 were soluble in integers, then, by Proposition 11, it would have a 
solution u, v such that 342 < 18, which is clearly impossible. 

As already observed, the equation (9) has no integral solutions if d = 3 mod4. 
It will now be shown that (9) does have integral solutions if d = p is prime and 
p = \1mod4. For let x, y be the fundamental solution of the equation (8). Since any 
square is congruent to 0 or 1 mod 4, we must have y* = 0 and x? = 1. Thus y = 2z 
for some positive integer z and 


(x — 1)(x + 1) = 4pz?. 


Since x is odd, x — | and x + | have greatest common divisor 2. It follows that there 
exist positive integers u, v such that 


either x — 1 = 2pu’, x+1=20% or x-1=2u’, x+1=2po’. 


In the first case v? — pu? = 1, which contradicts the choice of x, y as the funda- 
mental solution of (8), since v < x. Thus only the second case is possible and then 
— po = —1. (In fact, u, v is the fundamental solution of (9).) 

This proves again that any prime p = 1 mod4 may be represented as a sum of 
two squares, and moreover shows that an explicit construction for this representation 
is provided by the continued fraction expansion of ,/p. 

The representation of a prime p = 1 mod4 in the form x? + y? is actually unique, 
apart from interchanging x and y and changing their signs. For suppose 


ety =p=u240?, 
where x, y, u, v are all positive integers. Then 
yu — x2y2 = (p _ x” ju _ x?(p _ u’) = p(w _ ey 


Hence yu = exv mod p, where ¢ = +1. On the other hand, 


p> = (x? + y?)(u? +07) = (xu + eyo)? + (xo — eyu)’. 
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Since the second term on the right is divisible by p”, we must have xv = eyu or 


xu = —eyv. Evidently ¢ = | in the first case and ¢ = —1 in the second case. Since 
(x, y) = (u, v) = 1, it follows that eitherx =u, y=vorx =v, y=u. 
The equation x* — dy* = 1, where d is a positive integer which is not a per- 


fect square, is generally known as Pell’s equation, following an erroneous attribution 
of Euler. The problem of finding its integral solutions was issued as a challenge by 
Fermat (1657). In the same year Brouncker and Wallis gave a method of solution which 
is essentially the same as the solution by continued fractions. The first complete proof 
that a nontrivial solution always exists was given by Lagrange (1768). 

Unknown to them all, the problem had been considered centuries earlier by Hindu 
mathematicians. Special cases of Pell’s equation were solved by Brahmagupta (628) 
and a general method of solution, which was described by Bhascara II (1150), was 
known to Jayadeva at least a century earlier. No proofs were given, but their method 
is a modification of the solution by continued fractions and is often faster in practice. 
Bhascara found the fundamental solution of the equation x* — 61 y* = 1, namely 


x = 1766319049, =y = 226153980, 


a remarkable achievement for the era. 


5 The Modular Group 


We recall that a complex number w is said to be equivalent to a complex number z if 
there exist integers a, b,c, d with ad — bc = +1 such that 


w = (az +b)/(cz +d). 
Since we can write 
w = (az + b)(cz +d) /|cz + dl’, 
the imaginary parts are related by 
Fw = (ad — be).Fz/\cz +d)’. 


Consequently .%w and .%z have the same sign if ad — bc = 1 and opposite signs if 
ad — bc = —1. Since the map z > —z interchanges the upper and lower half-planes, 
we may restrict attention to z’s in the upper half-plane F? = {z € C: %z > O} and 
to w’s which are properly equivalent to them, i.e. with ad — bc = 1. 

A modular transformation is amap f : 4 > of the form 


f(z) = (az+b)/(cz +4), 


where a, b,c,d € Zand ad — bc = 1. Sucha map is bijective and its inverse is again 
a modular transformation: 


f3@) = (dz —b)/(-cz +4). 
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Furthermore, if 
gz) =(@z+b)/(Cz +a) 


is another modular transformation, then the composite map h = go f is again a mod- 
ular transformation: 


A(z) = "2 + b")/(C"z + 4"), 
where 
a’ =adat+b'c, b" =a'b+b'd, 
c=castd'c, d"=cb4+d'd, 
and hence 
a’d" — b"c" = (a'd' — b'c')(ad — bc) = 1. 


It follows that the set J” of all modular transformations is a group. Moreover, compo- 
sition of modular transformations corresponds to multiplication of the corresponding 


matrices: 
a” b” a b'\f(a b 
@ a = (: ?) [ A : 


However, the same modular transformation is obtained if the signs of a, b, c,d are all 
changed (and in no other way). It follows that the modular group I" is isomorphic to 
the factor group SL2(Z)/{+1} of the special linear group SL2(Z) of all 2 x 2 integer 
matrices with determinant | by its centre {+/}. 


Proposition 12 The modular group I is generated by the transformations 
T(z)=z+1, S(z)=-1/z. 


Proof It is evident that S,T € I and S* = J is the identity transformation. Any 
g € I has the form 


g(z) = (az +b)/(cz +d), 


where a,b,c,d € Zandad — bc = 1. If c = 0, thena = d = +l andg = 7”, 
where m = b/d € Z. Similarly if a = 0, then b = —c = +1 and g = ST", where 
m = d/c € Z. Suppose now that ac ¥ 0. For any n € Z we have 


ST "gz)=(@z4+b)/(Cz+d’), 


where a’ = —c, b! = —d,c! = a —nc andd’ = b — nd. We can choose n = my so 
that for g) = ST ~"" g we have |c’| < |a| and hence |a’| + |c’| < |a|+|c|.Ifa'c’ £0, 
the argument can be repeated with g; in place of g. After finitely many repetitions we 
must obtain 


ST —m .., ST -™"\g = T™ or ST™. 
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Since S~! = § and (T")~! = T~", it follows that 


g — TMS. zt, TM §T™ or g = Tm §: _ TMT". 


The proof of Proposition 12 may be regarded as an analogue of the continued 
fraction algorithm, since 


TMS... TMST M7 = mi — 


m2 —— 


1 
m+Z 


Mk — 


Obviously 7’ is also generated by S and R := ST. The transformation R has 
order 3, since 


R@=-W/et+), KP@=-C+V/z, Re) =z. 


We are going to show that all other relations between the generators S and R are 
consequences of the relations $* = R* = J, so that I’ is the free product of a cyclic 
group of order 2 and a cyclic group of order 3. 

Partition the upper half-plane # by putting 


A={z€4:22 <0}, B={zeEeH:4z>0}. 
It is easily verified that 
SACB, RBCA, RBCA 


(where the inclusions are strict). If g’ = SR®!SR®2--- SR" for some n > 1, where 
ej; € {1,2}, it follows that g’B C Band g’SA C B. Similarly, if g” = R°'S--- R, 
then g”B C Aand g”SA C A. By taking account of the relations S* = R? = I, every 
g € I can be written in one of the forms 


I, S, 3, 2", 2S, eS. 


But, by what has just been said, no element except the first is the identity transforma- 
tion. 

The modular group is discrete, since there exists a neighbourhood of the identity 
transformation which contains no other element of I’. 


Proposition 13 The open set 
F={z€4:-1/2 < &z < 1/2, |z| > 1} 


(see Figure 1) is a fundamental domain for the modular group I’, i.e. distinct points 
of F are not equivalent and each point of # is equivalent to some point of F or its 
boundary oF. 
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Proof For any z € C we write z = x +iy, where x, y € R. We show first that no 
two points of F are equivalent. Assume on the contrary that there exist distinct points 
z,z’ € F with y’ > y such that 


z= (az+b)/(cz +4) 


for some a,b,c,d € Z with ad — bc = 1. If c = 0, thena = d = +1, b $ O and 
z’ =z+b/d, which is impossible for z, z’ € F. Hence c # 0. Since 


y’ =y/lez +d)’, 


we have |cz + d| < 1. Thus |z + d/c| < 1/|c|, which is impossible not only if |c| > 2 
but also if c = +1. 

We now show that any zo € 7” is equivalent to a point of the closure F = FUOF. 
We can choose mg € Z so that z; = zo + mo satisfies |x;| < 1/2. If|z,| > 1, there is 
nothing more to do. Thus we now suppose |z;| < 1. Put z2 = —1/z,. Then 


y= yi/lalr’ > y1 


and actually y2 > 2y if y) < 1/2, since then |z;|? < 1/4+ 1/4 = 1/2. We now 
repeat the process, with z in place of zo, and choose m2 € Z so that z3 = z2 + m2 
satisfies |x3| < 1/2. From z3 = (m2z; — 1)/z, we obtain 


1z3|? = {(max1 — 1)? + (m2y1)7}/(x7 + YF). 


Assume |z3| < 1. Then m2 4 0 and also m2 ~ +1, since |1 + xj| > 1/2 > |x1]. 
If |m2| > 2, then |z3|? > 4]y;|? and hence y, < 1/2. Thus in passing from z to z3 
we obtain either z3 € F or y3 = y2 > 2y. Hence, after repeating the process finitely 
many times we must obtain a point z2.41 € F. 


Proposition 13 implies that the sets {g(F) : g € I} forma tiling of H, since 


H= UU e(F), g(F)Ne'(F) = Gif g.g’ e Pandy #8’. 
ge 


-l -1/2 0 1/2 1 


Fig. 1. Fundamental domain for J’. 
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This is illustrated in Figure 2, where the domain g(F) is represented simply by the 
group element g. 

There is an interesting connection between the modular group and binary quadratic 
forms. The discriminant of a binary quadratic form 


f =ax* + bxy +cy? 


with coefficients a,b,c € Ris D := b* — 4ac. The quadratic form is indefinite (i.e. 
assumes both positive and negative values) if and only if D > 0, and positive definite 
(i.e. assumes only positive values unless x = y = 0) if and only if D < 0,a > 0, 
which implies also c > 0. (If D = 0, the quadratic form is proportional to the square 
of a linear form.) 

If we make a linear change of variables 


i= ax’ + By’, y= yx! + dy’, 


where a, f,y,6 € Zand ad — fy = 1, the quadratic form f is transformed into the 
quadratic form 


7 _ ax? 4B xy + ey"?, 


where 
a’ =aa*+ bay + cy’, 
b' = 2aaf + b(ad + By) + 2cy 6, 
c' = ap” + bpd + co, 

and hence 


b? — 4a'c' = b* — 4ac = D. 


The quadratic forms f and f’ are said to be properly equivalent. 


-l -1/2 0 1/2 1 3/2 2 


Fig. 2. Tiling of # by I. 


206 IV Continued Fractions and Their Uses 


Thus properly equivalent forms have the same discriminant. As the name implies, 
proper equivalence is indeed an equivalence relation. Moreover, any form properly 
equivalent to an indefinite form is again indefinite, and any form properly equivalent 
to a positive definite form is again positive definite. 

We will now show that any binary quadratic form is properly equivalent to one 
which is in some sense canonical. The indefinite and positive definite cases will be 
treated separately. 

Suppose first that f is positive definite, so that D < 0,a > Oandc > 0. With the 
quadratic form f we associate a point t(f) of the upper half-plane .#, namely 


t(f) = (—b +iV—D)/2a. 


Thus 1 (f) is the root with positive imaginary part of the polynomial at? + bt +c. Con- 
versely, for any given D < Oandrt € .#, there is a unique positive definite quadratic 
form f with discriminant D such that t(f) = t. In fact, if tr = €+in, where €,7 € R 
and 7 > 0, we must take 


a= (-D)/2n, b= -2aé, c= (b*—D)/4a. 
Let f’, as above, be a form properly equivalent to f.Ift = (at’+f)/(y t/ +0), then 
at +bt+c=(a't? +bd' + Y/ot + 6)’. 


It follows that if 7 = t(f) and ct’ = r(f’), then t = (at’ + B)/(y t’ + 0). Thus 7’ is 
properly equivalent to r, in the terminology introduced in Section 1. - 
By Proposition 13 we may choose the change of variables so that rt’ € F, i.e. 


-1/2< Ar < 1/2, |r| >1. 
It is easily verified that this is the case if and only if for f’ we have 
ID|<da, O<d<c. 


Such a quadratic form f’ is said to be reduced. Thus every positive definite binary 
quadratic form is properly equivalent to a reduced form. (It is possible to ensure 
that every positive definite binary quadratic form is properly equivalent to a unique 
reduced form by slightly restricting the definition of ‘reduced’, but we will have no 
need of this.) 

If the coefficients of f are integers, then so also are the coefficients of f’ and 7, tr’ 
are complex quadratic irrationals. There are only finitely many reduced forms f with 
integer coefficients and with a given discriminant D < 0. For, if f is reduced, then 


Ab? < 4a” < 4ac = b* — D 


and hence b? < —D/3. Since 4ac = b* — D, for each of the finitely many possible 
values of b there are only finitely many possible values for a and c. 

A quadratic form f = ax” + bxy + cy” is said to be primitive if the coefficients 
a,b,c are integers with greatest common divisor |. For any integer D < 0, let hi(D) 
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denote the number of primitive positive definite quadratic forms with discriminant D 
which are properly inequivalent. By what has been said, ht (D) is finite. 
Consider next the indefinite case: 


f= ax” + bxy + cy” 
where a,b,c € Rand D > 0. If a £0, we can write 
f = a(x — cy) — ny), 


where &, 7 are the distinct real roots of the polynomial at? + br + c. It follows from 
Lemma 0 that, if € and 7 are irrational, then f is properly equivalent to a form f’ 
for which €’ > 1 and —1 < 7 < 0. Such a quadratic form f’ is said to be reduced. 
Evidently f’ is reduced if and only if — f’ is reduced. Thus we may suppose a’ > 0, 
and then f’ is reduced if and only if 


0<JD+b' <20' </D—P’. 


If the coefficients of f are integers and the positive integer D is not a square, then 
a # 0 and ¢€, are conjugate real quadratic irrationals. In this case, as we already 
saw in Section 3, there are only finitely many reduced forms with discriminant D. For 
any integer D > O which is not a square, let hi(D) denote the number of primitive 
quadratic forms with discriminant D which are properly inequivalent. By what has 
been said, hi(D) is finite. 

It should be noted that, for any quadratic form f with integer coefficients, the 
discriminant D = 0 or | mod 4. Moreover, for any D = 0 or | mod 4, there is a 
quadratic form f with integer coefficients and with discriminant D; for example, 


f =x? — Dy?/4 if D = 0mod4, 
f=x+4+xyt+(1—-D)y*/4 — if D=1mod4. 
The preceding results for quadratic forms can also be restated in terms of quadratic 


fields. By making correspond to the ideal with basis 8 = a, y = b+cq in the quadratic 
field Q(./d) the binary quadratic form 


{BB'x? + (By + B'y xy + yp y'y7}/ac, 


one can establish a bijective map between ‘strict’ equivalence classes of ideals in 
Q(/d) and proper equivalence classes of binary quadratic forms with discriminant D, 
where 


D=4d ifd=2or3mod4, 
D=d_ ifd=1mod4. 


(The middle coefficient b of f = ax* + bxy+ cy? was not required to be even in order 
to obtain this one-to-one correspondence.) Since any ideal class is either a strict ideal 
class or the union of two strict ideal classes, the finiteness of the class number h(d) of 
the quadratic field Q(./d) thus follows from the finiteness of hi (D). 
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6 Non-Euclidean Geometry 


There is an important connection between the modular group and the non-Euclidean 
geometry of Bolyai (1832) and Lobachevski (1829). It was first pointed out by 
Beltrami (1868) that their hyperbolic geometry is the geometry on a manifold of con- 
stant curvature. In the model of Poincaré (1882) for two-dimensional hyperbolic geom- 
etry the underlying space is taken to be the upper half-plane #. A ‘line’ is either a 
semi-circle with centre on the real axis or a half-line perpendicular to the real axis. It 
follows that through any two distinct points there passes exactly one ‘line’. However, 
through a given point not on a given ‘line’ there passes more than one ‘line’ having no 
point in common with the given ‘line’. 

Although Euclid’s parallel axiom fails to hold, all the other axioms of Euclidean 
geometry are satisfied. Poincaré’s model shows that if Euclidean geometry is free from 
contradiction, then so also is hyperbolic geometry. Before the advent of non-Euclidean 
geometry there had been absolute faith in Euclidean geometry. It is realized today that 
it is a matter for experiment to determine what kind of geometry best describes our 
physical world. 

Poincaré’s model will now be examined in more detail (with the constant 
curvature normalized to have the value —1). A curve y in .# is specified by a continu- 
ously differentiable function z(t) = x(t) + iy(t) (a < t < b). The (hyperbolic) length 
of y is defined to be 


b 
«=f y(t)! |dz/dt\dt. 


It follows from this definition that the ‘line’ segment joining two points z, w of # has 
length 

d(z,w) =In Belt le= el 
lz — wl —|z—- wv 
It may be shown that any other curve joining z and w has greater length. Thus the 
‘lines’ are geodesics. 

For any zo € -#, there is a unique geodesic through zo in any specified direction. 
Also, for any distinct real numbers ¢, 7, there is a unique geodesic which intersects the 
real axis at €, 7, namely the semicircle with centre at (€ + 7)/2. (By abuse of language 
we say ‘¢’, for example, when we mean the point (€, 0).) 

A linear fractional transformation 


z= f(z) = (az+b)/(cz +4), 


where a, b,c,d € R and ad — bc = 1, maps the upper half-plane .#% onto itself and 
maps ‘lines’ onto ‘lines’. Moreover, if the curve y is mapped onto the curve y’, then 
€(y) = €(y’), since Yf (z) = %z/|cz + dl? and df/dz = 1/|cz + d|*. In particular, 


d(z, w) = d(z’, w’). 


Thus a linear fractional transformation of the above form is an isometry. It may be 
shown that any isometry is either a linear fractional transformation of this form or is 
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obtained by composing such a transformation with the (orientation-reversing) trans- 
formation x +iy > —x +iy. For any two ‘lines’ L and L’, there is an isometry which 
maps L onto L’. 

We may define angles to be the same as in Euclidean geometry, since any linear 
fractional transformation is conformal. The (hyperbolic) area of adomain D C .#, 


defined by 
u(D) = II. y *dxdy, 


is invariant under any isometry. In particular, this gives 7 — (a + 6 + y) for the area 
of a ‘triangle’ with angles a, £, y . Since the angles are non-negative, the area of a ‘tri- 
angle’ is at most z and, since the area is necessarily positive, the sum of the angles of 
a ‘triangle’ is less than z. 

For example, if F is the fundamental domain of the modular group I’, then F is a 
‘triangle’ with angles 2/3, 2/3, 0 and hence the area of F is x — 22/3 = 2/3. For 
any fixed zo € F on the imaginary axis, we may characterize F as the set of allz € # 
such that, for every g € I’ withg # TJ, 


d(z, zo) < d(z, g(zo)) = d(g™'(z), zo). 


By identifying two points z, z’ of # if z’ = g(z) for some g € I we obtain the 
quotient space M = #/T . Equivalently, we may regard ./# as the closure F of the 
fundamental domain F with the boundary point —1/2 + iy identified with the bound- 
ary point 1/2 + iy (1 < y < oo) and the boundary point —e~? identified with the 
boundary point e!?(0 < @ < 2/2). 

Since the elements of I" are isometries of .#, the metric on .“ induces a metric on 
M in which the geodesics are the projections onto .@ of the geodesics in #. Thus if 
we regard ./@ as F with appropriate boundary points identified, then a geodesic in. # 
will be a sequence of geodesic arcs in F,, each with initial point and endpoint on the 
boundary of F’,, so that the initial point of one arc is the point identified to the endpoint 
of the preceding arc. 

Let L be a geodesic in # which intersects the real axis in irrational points ¢, 7 
such that € > 1, —1 < 9 < Oand let 


€ =[ao0, 41, a42,...], —1/n = [a_1,a_2,...] 


be the continued fraction expansions of € and —1/y. If we choose é and 7 = €’ to be 
conjugate quadratic irrationals then, by Proposition 7, the doubly-infinite sequence 


[...,d-2, 4-1, a0, 4, a2, ...] 
is periodic and it is not difficult to see that the geodesic in .W@ obtained by projection 


from L is closed. Artin (1924) showed that there are other geodesics which behave 
very differently. Let the convergents of € be py/gn and put 


c= (Pn—1¢n So Pn—2)/(Gn—1¢n + qn-2), N= (Pn—1n ag Pn—2)/(Gn—-1Nn + Gn—2). 
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Then 


Cn = [an, Qn41,---], —1/nn = [Gn—1, Gn—2, +=], 


and ¢, > 1, -—1 < 4, < 0. Moreover, if n is even, then ¢ and 7 are properly equivalent 
to ¢, and 7, respectively. If we choose ¢ so that the sequence ao, a1, a2, ... contains 
each finite sequence of positive integers (and hence contains it infinitely often), then 
the corresponding geodesic in .@ passes arbitrarily close to every point of .@ and to 
every direction at that point. 

Some much-studied subgroups of the modular group are the congruence subgroups 
I’(n), consisting of all linear fractional transformations z > (az + b)/(cz +d) inl’ 
congruent to the identity transformation, i.e. 


a=d=+l, b=c=0 modn. 


We may in the same way investigate the geodesics in the quotient space #/I'(n). In 
the case n = 3 it has been shown by Lehner and Sheingorn (1984) that there is an 
interesting connection with the Markov spectrum. 

In Section 2 we defined, for any irrational number ¢ with convergents py /qdn, 


M(E) = Tim gy 'lané — pal", 


and we noted that M(€) = M(n) if € and y are equivalent. It is not difficult to show 
that there are uncountably many inequivalent € for which M(€) = 3. However, it was 
shown by Markov (1879/80) that there is a sequence of real quadratic irrationals €“ 
such that M(é) < 3 if and only if € is equivalent to €™ for some k. If uy = M(E™), 
then “1 < w2 < w3 <--- and uz, — 3ask > o. Although yx is irrational, i is 


rational. The first few values are 


wy = 5)? = 2.236..., wz = 81/7 = 2.828..., 
m3 = (221)'/7/5 = 2.973..., 4 = (1517)!/7/13 = 2.996... 


As we already showed in Section 2, we can take €“) = (14+./5)/2 and €® = 14 V2. 
Lehner and Sheingorn showed that the simple closed geodesics in #/I'(3) are 
just the projections of the geodesics in .# whose endpoints €, 7 on the real axis are 
conjugate quadratic irrationals equivalent to “) for some k. 
There is a recursive procedure for calculating the quantities u, and €“). A Markov 
triple is a triple (u, v, w) of positive integers such that 


uw? +07 + w? = 3uow. 


If (u, v, w) is a Markov triple, then so also are (3uw — v, u, w) and (3uv — w, u,v). 
They are distinct from the original triple if 1 = max(u, v, w), since thenu < 3uw —v 
and u < 3uv — w. They are also distinct from one another if w < v. Starting from 
the trivial triple (1, 1, 1), all Markov triples can be obtained by repeated applications 
of this process. The successive values of u = max(u, v, w) are 1, 2,5, 13, 29,.... The 
numbers yu, and é“) are the corresponding successive values of (9 — 4/u7)!/? and 
(9 —4/u)'/?2/241/2+0/uw. 
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It was conjectured by Frobenius (1913) that a Markov triple is uniquely determined 
by its greatest element. This has been verified whenever the greatest element does not 
exceed 10!*. It has also been proved when the greatest element is a prime (and in 
some other cases) by Baragar (1996), using the theory of quadratic fields. 
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There is an important analogue of the continued fraction algorithm for infinite series. 
Let K be an arbitrary field and let F denote the set of all formal Laurent series 


f= Dow 
neZ 
with coefficients a, € K such that a, ~ 0 for at most finitely many n > 0. If 
=D hat 
neZ 
is also an element of F, and if we define addition and multiplication by 
f+e= > G@nt Bndt", ea oe: 
neZ neZ 


where y, = >» jtkan &j fx, then F acquires the structure of a commutative ring. In 
fact, F is a field. For, if f = >°,<) @nt”, where a, # 0, we obtain g = >),<_» Bnt” 
such that fg = | by solving successively the equations 


Ay PB» =1 


Ay B-v-1 + a—-1P-» =0 
Ay Bp p—-2 + Gy ip v-1 + 2p » =0 


Define the absolute value of an element f = >°,-7 nt” of F by putting 
|O|=0, |fl=2 iff £0, 
where v(f) is the greatest integer n such that a, 4 0. It is easily verified that 
Ifgl =Ifllgl lf +l < max(fl, ls), 


and | f + g| = max(|f], |g) if |f| F Isl. 
For any f = >),ez ant” € F, let 


Lfl= Doan”, {f}= Doane” 
n>0 n<O 


denote respectively its polynomial and strictly proper parts. Then |{f}| < 1, and 


ILFJI=IFLEIF] > Lie if Lf] AO. 
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If fo := f is not the formal Laurent series of a rational function, we can write 


fo =a0 + I/fi, 
where ag = | fo| and | fi| > 1. In the same way, 
fi=a+1/fr, 


where a; = [fi | and |/f2| > 1. Continuing in this way, we obtain the continued frac- 
tion expansion [ao, a1, a2, ...] of f. In the same way as for real numbers, if we define 
polynomials py, gn by the recurrence relations 


Pn = 4nPn—-1 + Pn—-2,. In = AnGn—1 + Gn-2 (n > 0), 


with p_2 = g-1 = 0, p-1 = q-2 = 1, then 
Pn@dn-1 — Pn-19n = Ci (n Ea 0), 
f = (Pafnti t+ Pa-1)/(Gnfnti t+ qn—-1) (n= 0), 
and so on. In addition, however, we now have 
lanl =|fnl > 1 (n> 1), 
from which we obtain by induction 
|Pul = lanllPn—1l > |Pn-11 lanl = laallgn—1l > ldn-1l (= 1). 
Hence 
|Pul = |aoa| «++ dnl, \dn| = |d)---ay| (n > 1). 
From the relation gn f — Pn = (—1)"/(dn fn-+1 + Qn—1) we further obtain 
lanf _ Pni| _ \gnaal 
since 
ldn fn+1 as dn—1| = ldn fn+1| — Gn\lan+1\ = ldn+1]- 
In particular, |gn f — pn| < 1 and hence 
Pn = (anf 1, Qn f}l = lgngil”’ (> 1). 


Thus py, is readily determined from g,. Furthermore, 


|f — Pa/dnl = lgnl "dail 70 asn>oo. 


The rational function py /qn is called the n-th convergent of f.The polynomials a, are 
called the partial quotients, and the Laurent series f, the complete quotients, in the 
continued fraction expansion of f. 

The continued fraction algorithm can also be applied when / is the formal Laurent 
expansion of a rational function, but in this case the process terminates after a finite 
number of steps. If ao, a1, a2, ... 1s any finite or infinite sequence of polynomials with 
|an| > 1 for n > 1, there is a unique formal Laurent series f with [do, a1, a2, ...] as 
its continued fraction expansion. 

For formal Laurent series there are sharper Diophantine properties than for real 
numbers: 
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Proposition 14 Let f be a formal Laurent series with convergents Pn/qn and let p,q 
be polynomials with q # O. 


(@) Iflq| < ldn4il and p/q F Pn/qn, then 
laf — Pl lan—1f ~ Pail = Ignl 
(ii) If \af — p| < lq\7|, then p/q is a convergent of f. 
Proof (i) Assume on the contrary that |¢f — p| < |qn|7!. Since 
dn(4f — P)— 4(Gnf — Pn) = 9Pn — PIn FO 
and |gn|lg¢f — p| < 1, we must have 
lallan+tl! = lallanf — Pal =l4Pn - PAnl 2 1, 


which is contrary to hypothesis. 
(ii) Assume that p/q is not a convergent of f. If f = pn/qy is a rational function 
then |qg| < |gn|, since 


1 < lgpy — payl =laf — pllawl < lal" lanl. 


Thus, whether or not f is rational, we can choose n so that |gn| < |¢| < |¢n41|. Hence, 
by (), 


laf — pl = lanl! = lal, 


which is a contradiction. 


It was shown by Abel (1826) that, for any complex polynomial D(t) which is not a 
square, the ‘Pell’ equation X* — D(t)Y* = 1 has a solution in polynomials X (ft), Y (t) 
of positive degree if and only if /D(‘) may be represented as a periodic continued 
fraction: ,/D(t) = [ao, @],..-> an], where ay = 2ag and aj; = an_j(i = 1,...,h —1) 
are polynomials of positive degree. By differentiation one obtains 


XX'/¥ =Y'D+4+ (1/2)YD’. 
It follows that Y divides X’, since X and Y are relatively prime, and 
(X +¥VDY = (X + YVD)X'/YVD. 


Thus the ‘abelian’ integral 
[ XOa/rovDo 


is actually the elementary function log{X (t) + Y(t)/ D(1)}. 

Some remarkable results have recently been obtained on the approximation of alge- 
braic numbers by rational numbers, which deserve to be mentioned here, even though 
the proofs are beyond our scope. 
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A complex number ¢ is said to be an algebraic number, or simply algebraic, of 
degree d if it is a root of a polynomial of degree d with rational coefficients which is 
irreducible over the rational field Q. Thus an algebraic number of degree 2 is just a 
quadratic irrational. 

For any irrational number ¢, there exist infinitely many rational numbers p/g such 
that 


\é — p/q| < 1/4’, 


since the inequality is satisfied by any convergent of ¢. It was shown by Roth (1955) 
that if € is a real algebraic number of degree d > 2 then, for any given ¢ > 0, there 
exist only finitely many rational numbers p/g with gq > 0 such that 


lé — p/q| < 1/q?**. 


The proof does not provide a bound for the magnitude of the rational numbers which 
satisfy the inequality, but it does provide a bound for their number. Roth’s result was 
the culmination of a line of research that was begun by Thue (1909), and further 
developed by Siegel (1921) and Dyson (1947). 

A sharpening of Roth’s result has been conjectured by Lang (1965): if ¢ is a real 
algebraic number of degree d > 2 then, for any given ¢ > 0, there exist only finitely 
many rational numbers p/q with g > 1 such that 


Ié — p/q| < 1/q?(logq)'**. 


An even stronger sharpening has been conjectured by P.M. Wong (1989) in which 
(log q)!*° is replaced by (log q) (log log q)'** with q > 2. 

For real algebraic numbers of degree 2 we already know more than this. For, if ¢ is a 
real quadratic irrational, its partial quotients are bounded and so there exists a constant 
c = c(é) > O such that | — p/q| > ¢/q? for every rational number p/q. It is a long- 
standing conjecture that this is false for any real algebraic number ¢ of degree d > 2. 

It is not difficult to show that Roth’s theorem may be restated in the following 
homogeneous form: if 


Li(u,v) =au+ fo, Lo(u,v) =yut+oo, 


are linearly independent linear forms with algebraic coefficients a, f, y, 6, then, for 
any given eé > 0, there exist at most finitely many integers x, y, not both zero, such that 


[Li(x, y) Lo, y)| < max(|xI, ly). 


The subspace theorem of W. Schmidt (1972) generalizes Roth’s theorem in this 
form to higher dimensions. In the stronger form given it by Vojta (1989) it says: 


if Li(u),...,2,(u) are linearly independent linear forms in n_ variables 
uo = (u1,...,Un) with (real or complex) algebraic coefficients, then there exist 
finitely many proper linear subspaces Vj,..., V;, of Q” such that every nonzero 
x = (x1,...,X,) € Z” for which 


|Li(x)---Ln(&)| < [lxll~*, 
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where ||x|| = max(|x1|,..., |Xn|), is contained in some subspace V;, except for finitely 
many points whose number may depend on ¢. A new proof of Schmidt’s subspace the- 
orem has been given by Faltings and Wiustholz (1994). The subspace theorem has also 
been given a more quantitative form by Schmidt (1989) and Evertse (1996). These 
results have immediate applications to the simultaneous approximation of several 
algebraic numbers. 

Vojta (1987) has developed a remarkable analogy between the approximation of 
algebraic numbers by rationals and the theory of Nevanlinna (1925) on the value dis- 
tribution of meromorphic functions, in which Roth’s theorem corresponds to Nevan- 
linna’s second main theorem. Although the analogy is largely formal, it is suggestive in 
both directions. It has already led to new proofs for the theorems of Roth and Schmidt, 
and to a proof of the Mordell conjecture (discussed below) which is quite different 
from the original proof by Faltings. 

Roth’s theorem has an interesting application to Diophantine equations. Let 


f@) =a0z” +ayz™!+--- +a, 


be a polynomial of degree n > 3 with integer coefficients whose roots are distinct and 
not rational. Let 


f(u,v) = agu” + ai pas ba" 


be the corresponding homogeneous polynomial and let g(u,v) be a polynomial of 
degree m > O with integer coefficients. We will deduce from Roth’s theorem that the 
equation 


f@, y) = g@, y) 


has at most finitely many solutions in integers if m < n — 3. This was already proved 
by Thue for m = 0. 

Assume on the contrary that there exist infinitely many solutions in integers. With- 
out loss of generality we may assume that there exist infinitely many integer solutions 
x, y for which |x| < |y|. Then there exists a constant cj > O such that 


Ig, y)| < cily|”. 


Over the complex field C the homogeneous polynomial f (u, v) has a factorization 


f(u,v) = ao] [@- gp), 


j=l 


where ¢],..., Cn are distinct algebraic numbers which are not rational. For at least one 
j we must have, for infinitely many x, y, 


laollx — Cjyl" < e1lyl” 


and hence 


m/n 


|x — jy] < colyl"", 
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where cz = (c1/|ao|)!/". If k & j, then 


lx — feyl = Cj — Se) ¥l — Ix — Syl 


m/n 


= c3ly| —celyl"" = calyl, 


where c3, c4 are positive constants. It follows that 


laollx — ¢jyleg"IyI"" < If, y= le@, y)I < enlyl” 


and hence 


I¢j —x/yl < es/lyl"™, 


where the positive constant cs depends only on the coefficients of f and g. Evidently 
this implies that ¢; is real. Since ¢; is not rational and m < n — 3, we now obtain a 
contradiction to Roth’s theorem. 

It is actually possible to characterize all polynomial Diophantine equations with 
infinitely many solutions. Let F(x, y) be a polynomial with rational coefficients which 
is irreducible over C. It was shown by Siegel (1929), by combining his own results on 
the approximation of algebraic numbers with results of Mordell and Weil concerning 
the rational points on elliptic curves and Jacobian varieties, that if the equation 


F(x, y)=0 (*) 


has infinitely many integer solutions, then there exist polynomials or Laurent poly- 
nomials ¢(t), y(t) (not both constant) with coefficients from either the rational field 
Q or a real quadratic field Q(/d), where d > 0 is a square-free integer, such that 
F(d(t), w(t)) is identically zero. If é(t), y(t) are Laurent polynomials with coeffi- 
cients from Q(./d), they may be chosen to be invariant when f is replaced by r~! and 
the coefficients are replaced by their conjugates in Q(/d). 

This implies, in particular, that the algebraic curve defined by (*) may be trans- 
formed by a birational transformation with rational coefficients into either a linear 
equation ax + by +c = 0 ora Pellian equation x? — dy” — m = 0. It is not signif- 
icant that the birational transformation has rational, rather than integral, coefficients 
since, by combining a result of Mahler (1934) with the Mordell conjecture, it may be 
seen that the same conclusions hold if the equation (+) has infinitely many solutions 
in rational numbers whose denominators involve only finitely many primes. 

The conjecture of Mordell (1922) says that the equation (*) has at most finitely 
many rational solutions if the algebraic curve defined by (*) has genus g > 1. (The 
concept of genus will not be formally defined here, but we mention that the genus of an 
irreducible plane algebraic curve may be calculated by a procedure due to M. Noether.) 
The conjecture has now been proved by Faltings (1983), as will be mentioned in 
Chapter XIII. As mentioned also at the end of Chapter XII, if the algebraic curve 
defined by (*) has genus 1, then explicit bounds may be obtained for the number of 
integral points. It was already shown by Hilbert and Hurwitz (1890) that the algebraic 
curve defined by (*) has genus 0 if and only if it is birationally equivalent over Q 
either to a line or to a conic. There then exist rational functions (ft), y(t) (not both 
constant) with coefficients either from Q or from a quadratic extension of Q such that 
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F(d(t), w(t)) is identically zero. The coefficients may be taken from Q if the curve 
has at least one non-singular rational point. 

Thus in retrospect, and quite unfairly, Siegel’s remarkable result may be seen as 
simply picking out those curves of genus 0 which have infinitely many integral points, 
a problem which had already been treated by Maillet (1919). 

In this connection it may be mentioned that the formula for Pythagorean triples 
given in §5 of Chapter II may be derived from the parametrization of the unit circle 
x* + y* = 1 by the rational functions 


xQ)=(-)/A+r), yo) = 24/1 +27). 


8 Further Remarks 


More extensive accounts of the theory of continued fractions are given in the books 
of Rockett and Szusz [45] and Perron [41]. Many historical references are given in 
Brezinski [12]. The first systematic account of the subject, which it is still a delight to 
read, was given in 1774 by Lagrange [32] in his additions to the French translation of 
Euler’s Algebra. 

The continued fraction algorithm is such a useful tool that there have been many 
attempts to generalize it to higher dimensions. Jacobi, in a paper published posthu- 
mously (1868), defined a continued fraction algorithm in R?. Perron (1907) extended 
his definition to R” and proved that convergence holds in the following weak sense: 
for a given nonzero x € R"”, the Jacobi-Perron algorithm constructs recursively a 
sequence of bases BK = {BK .. . b*y of Z” such that, for each j € {1,...,n}, the 
angle between the line obi and the line Ox tends to zero as k > oo. More seeently, 
other algorithms have bean proposed for which convergence holds in the strong sense 
that, for each j € {l,...,m}, the distance of bi from the line Ox tends to zero as 
k — oo. See Brentjes [1 1], Ferguson [22], Just [28] and Lagarias [31]. 

Proposition 2 was first proved by Serret [51]. Proposition 3 was proved by 
Lagrange. The complete characterization of best approximations is proved in the book 
of Perron. 

Lambert (1766) proved that z was irrational by using a continued fraction expan- 
sion for tan x. For the continued fraction expansion of z, see Choong et al. [15]. Badly 
approximable numbers are thoroughly surveyed by Shallit [52]. 

The theory of Diophantine approximation is treated more comprehensively in the 
books of Koksma [30], Cassels [13] and Schmidt [47]. 

The estimate O(/D log D) for the period of the continued fraction expansion of a 
quadratic irrational with discriminant D is proved by elementary means in the book of 
Rockett and Szusz. Further references are given in Podsypanin [42]. 

The ancient Hindu method of solving Pell’s equation is discussed in Selenius [49]. 
Tables for solving the Diophantine equation x7 — dy? = m, where m? < d, are 
given in Patz [39]. Pell’s equation plays a role in the negative solution of Hilbert’s 
tenth problem, which asks for an algorithm to determine whether an arbitrary polyno- 
mial Diophantine equation is solvable in integers. See Davis et al. [18] and Jones and 
Matijasevic [26]. 
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The continued fraction construction for the representation of a prime p = 1 mod4 
as a sum of two squares is due to Legendre. Some other constructions are given in 
Chapter V of Davenport [17] and in Wagon [61]. A construction for the representation 
of any positive integer as a sum of four squares is given by Rousseau [46]. 

The modular group is the basic example of a Fuchsian group, 1.e. a discrete sub- 
group of the group PSL2(R) of all linear fractional transformations z > (az + 
b)/(cz + d), where a,b,c,d € Rand ad — bc = 1. Fuchsian groups are studied 
from different points of view in the books of Katok [29], Beardon [7], Lehner [36], 
and Vinberg and Shvartsman [58]. 

The significance of Fuchsian groups stems in part from the uniformization theo- 
rem, which characterizes Riemann surfaces. A Riemann surface is a 1-dimensional 
complex manifold. Two Riemann surfaces are conformally equivalent if there is a 
bijective holomorphic map from one to the other. The uniformization theorem, first 
proved by Koebe and Poincaré independently in 1907, says that any Riemann surface 
is conformally equivalent to exactly one of the following: 


(i) the complex plane C, 

(11) the Riemann sphere C U {oo}, 

(iii) the cylinder C/G, where G is the cyclic group generated by the translation 
cet, 

(iv) atorus C/G, where G is the abelian group generated by the translations z > z+1 
andz > z+1 forsome t € # (the upper half-plane), 

(v) a quotient space #/G, where G is a Fuchsian group which acts freely on #, 
ieifze W,g ¢Gandg ¥/, then g(z) #z. 


(It should be noted that, since the modular group does not act freely on .#%, the cor- 
responding ‘Riemann surface’ is ramified.) For more information on the uniformiza- 
tion theorem, see Abikoff [1], Bers [9], Farkas and Kra [21], Jost [27], Beardon and 
Stephenson [8], and He and Schramm [24]. 

For the equivalence between quadratic fields and binary quadratic forms, see 
Zagier [63]. The class number /(d) of the quadratic field Q(/d) has been deeply 
investigated, originally by exploiting this equivalence. Dirichlet (1839) obtained an 
analytic formula for h(d) with the aid of his theorem on primes in an arithmetic pro- 
gression (which will be proved in Chapter X). A clearly motivated proof of Dirichlet’s 
formula is given in Hasse [23], and there are some interesting observations on the 
formula in Stark [56]. 

It was conjectured by Gauss (1801), in the language of quadratic forms, that 
h(d) > co asd 4 —oo. This was first proved by Heilbronn (1934). Siegel (1935) 
showed that actually 


logh(d)/log|d| > 1/2. asd ~ —o. 


Generalizations of these results to arbitrary algebraic number fields are given in books 
on algebraic number theory, e.g. Narkiewicz [38]. 

Siegel (1943) has given a natural generalization of the modular group to higher 
dimensions. Instead of the upper half-plane .#, we consider the space #%, of all com- 
plex n x n matrices Z = X + iY, where X, Y are real symmetric matrices and Y is 
positive definite. If the real 2n x 2n matrix 
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"9 


is symplectic, i.e. if M'J M = J, where 


Oo ! 
=(S 6): 
then the linear fractional transformation Z > (AZ + B)(CZ + D)~!, maps ™ onto 
itself. Siegel’s modular group J/;, is the group of all such transformations. The gener- 
alized upper half-plane .%; is itself just a special case of the vast theory of symmetric 
Riemannian spaces initiated by E. Cartan (1926/7). See Siegel [54] and Helgason [25]. 

The development of non-Euclidean geometry is traced in Bonola [10]. (This 
edition also contains translations of works by Bolyai and Lobachevski.) The basic 
properties of Poincaré’s model, here only stated, are proved in the books of Katok [29] 
and Beardon [7]. 

For the connection between continued fractions and geodesics, see Artin [5] and 
Sheingorn [53]. For the Markov spectrum see not only the books of Cassels [13] and 
Rockett and Szusz [45], but also Cusick and Flahive [16] and Baragar [6]. 

The theory of continued fractions for formal Laurent series is developed further in 
de Mathan [37]. The corresponding theory of Diophantine approximation is surveyed 
in Lasjaunias [35]. The polynomial Pell equation is discussed by Schmidt [48]. For for- 
mal Laurent series there is a multidimensional generalization which is quite different 
from those for real numbers; see Antoulas [4]. 

Roth’s theorem and Schmidt’s subspace theorem are proved in Schmidt [47]. See 
also Faltings and Wustholz [20] and Evertse [19]. Nevanlinna’s theory of the value dis- 
tribution of meromorphic functions is treated in the recent book of Cherry and Ye [14]. 
For Vojta’s work see, for example, [59] and [60]. It should be noted, though, that 
this area is still in a state of flux, besides using techniques beyond our scope. For an 
overview, see Lang [34]. 

Siegel’s theorem on Diophantine equations with infinitely many solutions is proved 
with the aid of non-standard analysis by Robinson and Roquette [44]; the proof is re- 
produced in Stepanov [57]. The theorem is discussed from the standpoint of Diophan- 
tine geometry in Serre [50]. Any algebraic curve over Q of genus zero which has a 
nonsingular rational point can be parametrized by rational functions effectively; see 
Poulakis [43]. 

It is worth noting that if F(x, y) is a polynomial with rational coefficients which 
is irreducible over Q, but not over C, then the curve F(x, y) = 0 has at most finitely 
many rational points. For any rational point is a common root of at least two distinct 
complex-irreducible factors of F and any two such factors have at most finitely many 
common complex roots. 

In conclusion we mention some further applications of continued fractions. A pro- 
cedure, due to Vincent (1836), for separating the roots of a polynomial with integer 
coefficients has acquired some practical value with the advent of modern computers. 
See Alesina and Galuzzi [3]. 

Continued fractions play a role in the small divisor problems of classical mechan- 
ics. As an example, suppose the function f is holomorphic in some neighbourhood 
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of the origin and f(z) = Az + O(z?), where A = e277 for some irrational @. It 
is readily shown that there exists a formal power series h which linearizes f, i.e. 
f(A()) = h(4z). Brjuno (1971) proved that this formal power series converges in 
a neighbourhood of the origin if >°,,.9(log qn41)/dn < 00, where gy is the denomi- 
nator of the n-th convergent of @. It was shown by Yoccoz (1995) that this condition 
is also necessary. In fact, if }°,,. (og qn41)/dn = 00, the conclusion fails even for 
f(z) = 4zC. — z). See Yoccoz [62] and Pérez-Marco [40]. 

Our discussion of continued fractions has neglected their analytic theory. The out- 
standing work of Stieltjes (1894) on the problem of moments, which was extended by 
Hamburger (1920) and R. Nevanlinna (1922) from the half-line to the whole line, not 
only gave birth to the Stieltjes integral but also contributed to the development of func- 
tional analysis. For modern accounts, see Akhiezer [2], Landau [33] and Simon [55]. 
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Vv 


Hadamard’s Determinant Problem 


It was shown by Hadamard (1893) that, if all elements of ann x n matrix of complex 
numbers have absolute value at most jw, then the determinant of the matrix has absolute 
value at most ”"n”/?. For each positive integer n there exist complex n x n matrices 
for which this upper bound is attained. For example, the upper bound is attained for 
uu = 1 by the matrix (w/*)(1 < j,k < n), where q is a primitive n-th root of unity. 
This matrix is real for n = 1,2. However, Hadamard also showed that if the upper 
bound is attained for a real n x n matrix, where n > 2, then 77 is divisible by 4. 

Without loss of generality one may suppose uw = |. A real x n matrix for which 
the upper bound n”/? is attained in this case is today called a Hadamard matrix. It 
is still an open question whether an n x n Hadamard matrix exists for every positive 
integer n divisible by 4. 

Hadamard’s inequality played an important role in the theory of linear integral 
equations created by Fredholm (1900), and partly for this reason many proofs and 
generalizations were soon given. Fredholm’s approach to linear integral equations has 
been superseded, but Hadamard’s inequality has found connections with several other 
branches of mathematics, such as number theory, combinatorics and group theory. 
Hadamard matrices have been used to enhance the precision of spectrometers, to 
design agricultural experiments and to correct errors in messages transmitted by 
spacecraft. 

The moral is that a good mathematical problem will in time find applications. 
Although the case where n is divisible by 4 has a richer theory, we will also treat 
other cases of Hadamard’s determinant problem, since progress with them might lead 
to progress also for Hadamard matrices. 


1 What is a Determinant? 


The system of two simultaneous linear equations 


arid) + a1262 = By 
a2ie) + a22¢2 = po 


W.A. Coppel, Number Theory: An Introduction to Mathematics, Universitext, 223 
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has, if 62 = a@11a22 — a12a21 is nonzero, the unique solution 


&1 = (f1a22 — fr012)/02,  & = —(Bia21 — f2a11)/02. 


If 67 = O, then either there is no solution or there is more than one solution. 
Similarly the system of three simultaneous linear equations 


anes + ay2é2 + 4136 = fy 
ame, + a2262 + 423 = fo 
a31é1 + 03242 + 4338 = B3 


has a unique solution if and only if 63 4 0, where 


03 = 411422033 + 01223031 + 413021032 


— G1 1423032 — A12421A33 — 413022031. 


These considerations may be extended to any finite number of simultaneous linear 
equations. The system 


aie; + ay2é2 +--+ + ingn = Bi 
ane; + a2262 +--+ + angn = fo 


OniS1 + On202 +++ + Onngn = Bn 


has a unique solution if and only if 6, 4 0, where 


On = > EO ky O2ky *** Anky » 


the sum being taken over all m! permutations kj, ko,...,k, of 1,2,..., and the sign 
chosen being + or — according as the permutation is even or odd, as defined in Chap- 
ter I, §7. 

It has been tacitly assumed that the given quantities ajx, Bj(j,k = 1,...,n) are 
real numbers, in which case the solution &(k = 1,...,7) also consists of real num- 
bers. However, everything that has been said remains valid if the given quantities are 
elements of an arbitrary field F, in which case the solution also consists of elements 
of F’. Since 6, is an element of F which is uniquely determined by the matrix 


Qil «t+ Gin 
A- nae ; 
Onl <**  Gnn 


it will be called the determinant of the matrix A and denoted by det A. 

Determinants appear in the work of the Japanese mathematician Seki (1683) and 
in a letter of Leibniz (1693) to l’Hospital, but neither had any influence on later 
developments. The rule which expresses the solution of a system of linear equations by 
quotients of determinants was stated by Cramer (1750), but the study of determinants 
for their own sake began with Vandermonde (1771). The word ‘determinant’ was first 
used in the present sense by Cauchy (1812), who gave a systematic account of their 


1 What is a Determinant? 225 


theory. The diffusion of this theory throughout the mathematical world owes much to 
the clear exposition of Jacobi (1841). 

For the practical solution of linear equations Cramer’s rule is certainly inferior to 
the age-old method of elimination of variables. Even many of the theoretical uses to 
which determinants were once put have been replaced by simpler arguments from 
linear algebra, to the extent that some have advocated banning determinants from 
the curriculum. However, determinants have a geometrical interpretation which makes 
their survival desirable. 

Let M,, (IR) denote the set of all n x n matrices with entries from the real field R. 
If A € M,,(R), then the linear map x > Ax of R” into itself multiplies the volume of 
any parallelotope by a fixed factor w(A) > 0. Evidently 


(i)” w(AB) = n(A)u(B) forall A, B € M,(R), 
(ii)” “(D) = |a| for any diagonal matrix D = diag[1,..., 1, a] € M,(R). 


(A matrix A = (a jx) is denoted by diag[a11, a22, ..., Gnn] if a jx = 0 whenever j A k 
and is then said to be diagonal.) It may be shown (e.g., by representing A as a product 
of elementary matrices in the manner described below) that (A) = | det A|. The sign 
of the determinant also has a geometrical interpretation: det A 2 0 according as the 
linear map x > Ax preserves or reverses orientation. 

Now let F be an arbitrary field and let M, = M,,(F) denote the set of all n x n 
matrices with entries from /’. We intend to show that determinants, as defined above, 
have the properties: 


(i) det(AB) = det A - det B forall A, B € My, 
(ii)’ det D = a for any diagonal matrix D = diag[1,...,1,a] © Mn, 


and, moreover, that these two properties actually characterize determinants. To avoid 
notational complexity, we consider first the case n = 2. 

Let & denote the set of all matrices A € M2 which are products of finitely many 
matrices of the form U;, V,,, where 


1 A 1 0 


and A, wu € F. The set & is a group under matrix multiplication, since multiplication 
is associative, J € &, & is obviously closed under multiplication and U;, V, have 
inverses U_;, V_, respectively. 

We are going to show that, if A € Mz and A # O, then there exist S,T € & and 
6 € F such that SAT = diag[1, 6]. 


For any p # 0, put 
_fo -1 _fp' 0 
wel; 0 | =| of 


Then W = U_\V\U_; € & and also R, € @ since, ifo = 1— p, p= p and 
t = p* — p, then 


Rp = V1U¢ VpUt. 
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Let 


where at least one of a, 6, y, dis nonzero. By multiplying A on the left, or on the right, 
or both by W we may suppose that a 4 0. Now, by multiplying A on the right or left 
by Ra, we may suppose that a = 1. Next, by multiplying A on the right by U_z, we 
may further suppose that 8 = 0. Finally, by multiplying A on the left by V_,, we may 
also suppose that y = 0. 

The preceding argument is valid even if F is a division ring. In what follows we 
will use the commutativity of multiplication in F. 

We are now going to show that if d : & — F is a map such that d(ST) = 
d(S)d(T) for all S,T € &, then either d(S) = 0 for every S € & or d(S) = 1 for 
everySE@. 

If d(T) = 0 for some T € @, then d(J) = d(T)d(T~') = 0 and d(S) = 
d(I1)d(S) = 0 for every S € & . Thus we now suppose d(S) 4 0 for every S € &. 
Then, in the same way, d(/) = 1 and d(S~!) = d(S)~! for every S € &. 

It is easily verified that 


U,V, = Vatu, ViVu = Vitus 
Wl=-W, W!V,W=U_,. 


It follows that 
AV) Sd.) =d0 a. 
Also, for any p 4 0, 
RRs Oe 


Hence d(Uj,2) = d(U,) and d(U;(,2_1)) = 1. 

If the field F contains more than three elements, then p* — 1 4 0 for some nonzero 
p € F. Since A(p* — 1) runs through the nonzero elements of F at the same time as A, 
it follows that d(U,) = 1 for every A € F. Hence also d(V,,) = 1 for every uw € F 
and d(S$) = 1 forall Se &. 

If F contains 2 elements, then d(S) = 1 for every S € & is the only possibility. If 
F contains 3 elements, then d(S) = +1 for every S € &. Hence d(S~!) = d(S) and 
d(S*) = 1. Since U2 = U; and U; = Ue this implies d(U,) = 1 for every 1 € F, 
and the rest follows as before. 

The preceding discussion is easily extended to higher dimensions. Put 


Ujj (A) = In + Ej, 


for any i, j € {1,...,n} withi ¢ j, where Ej; is the n x n matrix with all entries 0 
except the (i, j)-th, which is 1, and let SL,(F) denote the set of all A € M, which 
are products of finitely many matrices Uj;(A). Then SL,(/) is a group under matrix 
multiplication. 

If A € M, and A ¥ O, then there exist $,T € SL,(F) and a positive integer 
r <n such that 


1 What is a Determinant? 227 
SAT = diag[1,~1, 6, On—+] 


for some nonzero 6 € F. The matrix A is singular if r < n and nonsingular if r =n. 
Hence A = (ajx) is nonsingular if and only if its transpose A‘ = (a,;;) is nonsingular. 
In the nonsingular case we need multiply A on only one side by a matrix from SL, (F) 
to bring it to the form 


D5 = diag[1,—1, Oo]. 


For if SAT = Ds, then SA = D5T7! and this implies SA = S’Ds for some 
S’ € SL, (F), since 


DsUjj(2) = UijQO"')Dp_ ifi < j=n, 
DsUij (A) = Uij (02) D5 if j <i=n, 
DsUjj (A) = Vij A) Do ifi, 7 Anandi Fj. 
In the same way as for n = 2 it may be shown that, if d : SL,(F) > F is a map 


such that d(ST) = d(S)d(T) for all S, T € SL,(F), then either d(S) = 0 for every S$ 
or d(S) = 1 for every S. 


Theorem 1 There exists a unique map d : M, — F such that 
(i)’ d(AB) = d(A)d(B) for all A, B € Mn, 
(ii)’ for anya é€ F, if Dg = diag[1n-1, a], then d(Dg) = a. 


Proof We consider first uniqueness. Since d(J) = d(D,) = 1, we must have d(S) = 1 
for every S € SLy,(F), by what we have just said. Also, if 


H = diag[7,..., n—1, 0], 


then d(H) = 0, since H = DoH. In particular, d(O) = 0. If A € M, and A ¥ O, 
there exist S, T € SL,(F) such that 


SAT = diag[1,—1, 0, On—r], 


where 1 <r < nando ¥ 0. It follows that d(A) = Oifr <n, i.e. if A is singular. On 
the other hand if r = n, i.e. if A is nonsingular, then SAT = Dg and hence d(A) = 0. 
This proves uniqueness. 

We consider next existence. For any A = (a jx) € Mn, define 


det A = >. (sgn 6 )A1¢102¢2 ‘**Onons 
aESn 


where o is a permutation of 1,2,...,n,sgno = | or —1 according as the permu- 
tation o is even or odd, and the summation is over the symmetric group -%, of all 
permutations. Several consequences of this definition will now be derived. 


(i) if every entry in some row of A is 0, then det A = 0. 


Proof Every summand vanishes in the expression for det A. 
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(ii) if the matrix B is obtained from the matrix A by multiplying all entries in one row 
by 2, then det B = A det A. 


Proof This is also clear, since in the expression for det A each summand contains 
exactly one factor from any given row. 


(iii) if two rows of A are the same, then det A = 0. 


Proof Suppose for definiteness that the first and second rows are the same, and let t 
be the permutation which interchanges | and 2 and leaves fixed every k > 2. Then t 
is odd and we can write 


det A = > O1¢619%262°** Anon — > O1o71%2¢12°** Onotn> 
TED TEDHy 


where .&, is the alternating group of all even permutations. In the second sum 


O1o71926172°** Onotn = 416242614363 °** Anon = 426241614363 °** Anon, 


because the first and second rows are the same. Hence the two sums cancel. 


(iv) if the matrix B is obtained from the matrix A by adding a scalar multiple of one 
row to a different row, then det B = det A. 


Proof Suppose for definiteness that B is obtained from A by adding A times the sec- 
ond row to the first. Then 


det B = > (sgn 6 )a1¢142¢2 “Anon $A > (sgn og )A2¢ 14262 "+ Onon- 
oeSn oeSn 


The first sum is det A and the second sum is 0, by (iii), since it is the determinant of 
the matrix obtained from A by replacing the first row by the second. 


(v) if A is singular, then det A = 0. 


Proof If A is singular, then some row of A is a linear combination of the remaining 
rows. Thus by subtracting from this row scalar multiples of the remaining rows we can 
replace it by a row of 0’s. For the new matrix B we have det B = 0, by (i). On the 
other hand, det B = det A, by (iv). 


(vi) if A = diag[d,,..., dn], then det A = 6, -- + 0y. In particular, det Dg = a. 


Proof Inthe expression for det A the only possible nonzero summand is that for which 
o is the identity permutation, and the identity permutation is even. 


(vii) det(A B) = det A - det B forall A, B € My. 


Proof If A is singular, then AB is also and so, by (v), det(AB) = 0 = det A - det B. 
Thus we now suppose that A is nonsingular. Then there exists S € SL,(F) such 
that SA = Ds for some nonzero 6 € F. Since, by the definition of SL,(F), left 
multiplication by S corresponds to a finite number of operations of the type 
considered in (iv) we have 
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det A = det(SA) = det Ds 


and 
det(AB) = det(SAB) = det(D 5B). 


But det Ds = 0, by (vi), and det(DsB) = ddet B, by (ii). Therefore det(AB) = 
det A - det B. 
This completes the proof of existence. 


Corollary 2 If A € M, and if A' is the transpose of A, then det A‘ = det A. 

Proof The map d : M, — F defined by d(A) = det A’ also has the properties 

(i)’, Gi)’. 
The proof of Theorem 1 shows further that SL,(F) is the special linear group, 

consisting of all A € My with det A = 1. 


We do not propose to establish here all the properties of determinants which we 
may later require. However, we note that if 


s=[e >| 


is a partitioned matrix, where B and D are square matrices of smaller size, then 


det A = det B- det D. 


It follows that if A = (a;x) is lower triangular (i.e. ajx = 0 for all j,k with j < k) 
or upper triangular (i.e. aj = 0 for all j, k with j > k), then 


det A = a11022---Gny. 


2 Hadamard Matrices 


We begin by obtaining an upper bound for det(A‘ A), where A is ann xm real matrix. If 
m =n, then det(A‘A) = (det A)? and bounding det(A‘ A) is the same as Hadamard’s 
problem of bounding | det A|. However, as we will see in §3, the problem is of interest 
also form <n. 

In the statement of the following result we denote by ||v|| the Euclidean norm of 
a vector v = (a1,...,0n) € R”. Thus |lv|| > 0 and |lo||? = at +-++4+ a2, The 
geometrical interpretation of the result is that a parallelotope with given side lengths 
has maximum volume when the sides are orthogonal. 


Proposition 3 Let A be an n x m real matrix with linearly independent columns 
D1, ..+,UVm. Then 


m 


det(A'A) < TT loxll, 


k=! 


with equality if and only if A‘A is a diagonal matrix. 
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Proof We are going to construct inductively mutually orthogonal vectors w1,..., Wm 
such that w, is a linear combination of v1,..., vg in which the coefficient of vz is 1 
(1 < k < m). Take w; = v; and suppose w1,..., wx—1 have been determined. If we 
take 

Wk = VE — AW — +++ — Ak-1Wk-1, 


where aj = (vx, wj), then (wz, wj;) = O (1 < j < k). Moreover, wx ¥ 0, since 


D1,..., Vg are linearly independent. (This is the same process as in §10 of Chapter I, 
but without the normalization.) 
If B is the matrix with columns w),..., Wm then, by construction, 


B'B = diag[6,..., dm] 


is a diagonal matrix with diagonal entries dg = ||w ||? and AT = B for some upper 
triangular matrix T with 1’s in the main diagonal. Since det T = 1, we have 
m 
det(A'A) = det(B‘B) = | | |lwxll’. 
k=1 


But 


2 2 2 2 2 2 
Nox I" = [eg ll + foal? leor yl? + + + Joe lox—al 


and hence ||w,||7 < |lvx||*, with equality only if wz = vg. The result follows. 


Corollary 4 Let A = (aj) be ann x m real matrix such that |ajx| < 1 for all j,k. 
Then 


det(A'A) <n", 
with equality if and only if ajx = £1 for all j,k and A'A = nIn. 
Proof We may assume that the columns of A are linearly independent, since 


otherwise det(A’A) = 0. If vg is the k-th column of A, then ||vg ||? < 1, with equality 
if and only if |a;,| = 1 for 1 < j <n. The result now follows from Proposition 3. 


Ann x m matrix A = (a jx) will be said to be an H-matrix if aj, = +1 for all j,k 
and A'A = n1Iv,. If, in addition, m =n then A will be said to be a Hadamard matrix 
of order n. 

If Ais ann x m H-matrix, then m < n. Furthermore, if A is a Hadamard matrix 
of order 7 then, for any m <n, the submatrix formed by the first m columns of A is an 
H-matrix. (This distinction between H-matrices and Hadamard matrices is con- 
venient, but not standard. It is an unproven conjecture that any H-matrix can be 
completed to a Hadamard matrix.) 

The transpose A’ of a Hadamard matrix A is again a Hadamard matrix, since 
A’ = nA7! commutes with A. The 1 x 1 unit matrix is a Hadamard matrix, and 
so is the 2 x 2 matrix 
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pe } 


There is one rather simple procedure for constructing H-matrices. If A = (a jx) is 
ann X m matrix and B = (fj¢) aq x p matrix, then the ng x mp matrix 


1B ayB::+ GimB 
a21B a22B::+ G2mB 

ha 
OniB an2B ::: OnmB 


with entries a ;,Pic, is called the Kronecker product of A and B and is denoted by 
A ® B. It is easily verified that 


(A ® B)\(C ® D) = AC @ BD 
and 
(A® BY =A'®@B'. 


It follows directly from these rules of calculation that if A; is ann, xm, H-matrix and 
A2 an nz X m2 H-matrix, then A; @ A2 is annjn2 X mym2 H-matrix. Consequently, 
since there exist Hadamard matrices of orders | and 2, there also exist Hadamard 
matrices of order any power of 2. This was already known to Sylvester (1867). 


Proposition 5 Let A = (ajx) be ann x m H-matrix. Ifn > 1, then n is even and any 
two distinct columns of A have the same entries in exactly n/2 rows. Ifn > 2, thenn 
is divisible by 4 and any three distinct columns of A have the same entries in exactly 
n/4 rows. 


Proof If j €k, then 
Ay jk t+++ + Anjank = 0. 


Since aj; a; = 1 if the j-th and k-th columns have the same entry in the i-th row and 
= —1 otherwise, the number of rows in which the j-th and k-th columns have the same 
entry is 1/2. 

If j,k, € are all different, then 


n n 
Va + aik) (aij + aie) = > a}, =n. 
i=l i=l 
But (a@;jj + aix)(@ij + aie) = 4 if the j-th, k-th and ¢-th columns all have the same 
entry in the i-th row and = 0 otherwise. Hence the number of rows in which the j-th, 
k-th and ¢-th columns all have the same entry is exactly n/4. 


Thus the order n of a Hadamard matrix must be divisible by 4 ifn > 2. It is 
unknown if a Hadamard matrix of order n exists for every n divisible by 4. However, 
it is known for n < 424 and for several infinite families of n. We restrict attention here 
to the family of Hadamard matrices constructed by Paley (1933). 

The following lemma may be immediately verified by matrix multiplication. 
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Lemma 6 Let C be ann xX n matrix, with 0’s on the main diagonal and all other 
entries 1 or —1, such that 


C'C=(n—l)h. 


If C is skew-symmetric (i.e. C' = —C), then C + I is a Hadamard matrix of order 
n, whereas if C is symmetric (i.e. C' = C), then 


C+IC-I 
CH=) =CHd 
is a Hadamard matrix of order 2n. 


Proposition 7 [fq is a power of an odd prime, there exists a (q + 1) x (¢ + 1) matrix 
C with 0’s on the main diagonal and all other entries 1 or —1, such that 


@ Ce sqigus, 
(ii) C is skew-symmetric if gq = 3 mod4 and symmetric if q = 1 mod 4. 


Proof Let F be a finite field containing g elements. Since g is odd, not all elements 
of F are squares. For any a € F, put 
0 ifa=0, 
y(a)= 41 ifa £0 anda =c’ forsomec € F, 


—1  ifa is nota square. 


If g = p isa prime, then F is the field of integers modulo p and y (a) = (a/p) is the 
Legendre symbol studied in Chapter II. The following argument may be restricted to 
this case, if desired. 

Since the multiplicative group of F is cyclic, we have 


y(ab) = x(a)xy(b) foralla,be F. 


Since the number of nonzero elements which are squares is equal to the number which 
are non-squares, we also have 


> x(a) =0. 
acF 
It follows that, for any c # 0, 
DO) +0) = Di xP xt cb") = Dix) = 1. 
beF b#0 xAl 
Let 0 = ao, a1,...,@g—1 be an enumeration of the elements of F and define a 
q X q matrix O = (qjx) by 
qjk = X(aj —ax) OS j,k <q). 


Thus @Q has 0’s on the main diagonal and +1’s elsewhere. Also, by what has been said 
in the previous paragraph, if J, denotes the m x m matrix with all entries 1, then 
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O1,=0, OO=4qh-4y. 


Furthermore, since y(—1) = (—1)49-)/?, Q is symmetric if g = 1 mod4 and skew- 
symmetric if g = 3 mod 4. If e,, denotes the 1 x m matrix with all entries 1, it follows 


that the matrix 
0 eq 
_— [sey 3] 


where the + sign is chosen according as g = +1mod4, satisfies the various 
requirements. 


By combining Lemma 6 with Proposition 7 we obtain Paley’s result that, for any 
odd prime power q, there exists a Hadamard matrix of order g + 1 if g = 3 mod4 and 
of order 2(g + 1) if g = 1 mod4. Together with the Kronecker product construction, 
this establishes the existence of Hadamard matrices for all orders n = 0mod4 with 
n < 100, exceptn = 92. 

A Hadamard matrix of order 92 was found by Baumert, Golomb and Hall (1962), 
using a computer search and the following method proposed by Williamson (1944). 
Let A, B, C, D bed x d matrices with entries +1 and let 


A D B C 
a0: ae 2B 
f=) og < A =p? 


-C -B D A 


ie. H=A@QlI+B@i+C@j+D@k, where the 4 x 4 matrices /,i, j,k are 
matrix representations of the unit quaternions. It may be immediately verified that H 
is a Hadamard matrix of order n = 4d if 


ATA+B'B4+C'C+D'D=4dl, 
and 
xX'y=yY'x 


for every two distinct matrices X, Y from the set {A, B, C, D}. The first infinite class 
of Hadamard matrices of Williamson type was found by Turyn (1972), who showed 
that they exist for all orders n = 2(q + 1), where q is a prime power and g = | mod4. 
Lagrange’s theorem that any positive integer is a sum of four squares suggests that 
Hadamard matrices of Williamson type may exist for all orders n = 0 mod 4. 

The Hadamard matrices constructed by Paley are either symmetric or of the form 
I + S, where S is skew-symmetric. It has been conjectured that in fact Hadamard 
matrices of both these types exist for all orders n = 0 mod 4. 


3 The Art of Weighing 


It was observed by Yates (1935) that, if several quantities are to be measured, more 
accurate results may be obtained by measuring suitable combinations of them than 
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by measuring each separately. Suppose, for definiteness, that we have m objects 
whose weights are to be determined and we perform n > m weighings. The whole 
experiment may be represented by an n x m matrix A = (ajx). If the k-th object is 
not involved in the j-th weighing, then aj, = 0; if it is involved, then aj, = +1 
or —1 according as it is placed in the left-hand or right-hand pan of the balance. The 
individual weights ¢),...,¢m are connected with the observed results 71,..., 4% of 
the weighings by the system of linear equations 


y= Ax, (1) 


where x = (€1,...,@n)! € R™ and y = (m1, ..., Mn)! € R". 

We will again denote by |y|| the Euclidean norm (|71|7 +--+ + |yn|?)!/? of the 
vector y. Let x € R” have as its coordinates the correct weights and let y = Ax. If, 
because of errors of measurement, y ranges over the ball ||y — y|| < p in R”, then 
x ranges over the ellipsoid (x — x)!A'A(x — x) < p* in R”. Since the volume of 
the ellipsoid is [det(A'A)]~!/? times the volume of the ball, we may regard the best 
choice of the design matrix A to be that for which the ellipsoid has minimum volume. 
Thus we are led to the problem of maximizing det(A’ A) among all n x m matrices 
A= (a jx) with a jx € {0, —1, 1}. 

A different approach to the best choice of design matrix leads (by §2) to a similar 
result. If n > m the linear system (1) is overdetermined. However, the least squares 
estimate for the solution of (1) is 


x=Cy, 


where C = (A‘A)~!A’‘. Let ax € R” be the k-th column of A and let cx, € R” be the 
k-th row of C. Since CA = Ij, we have cgay = 1. If y ranges over the ball || y—y|| < p 
in R”, then & ranges over the real interval |& — &| < pl|cx||. Thus we may regard the 
optimal choice of the design matrix A for measuring ¢; to be that for which ||c,|| is a 
minimum. 

By Schwarz’s inequality (Chapter I, §4), 


Ilex lIlla@xll = 1, 


with equality only if cj, is a scalar multiple of ax. Also |lax|| < n'/?, since all elements 
of A have absolute value at most 1. Hence ||cx|| > n7!/2, with equality if and only if all 
elements of a; have absolute value 1 and oH = a;/n. It follows that the design matrix A 
is optimal for measuring each of €1,..., Gm if all elements of A have absolute value | 
and A‘A = nI,,. Moreover, in this case the least squares estimate for the solution 
of (1) is simply x = A'y/n. Thus the individual weights are easily determined from 
the observed measurements by additions and subtractions, followed by a division by n. 
Suppose, for example, that m = 3 andn = 4. If we take 


+++ 
++ 
—++4+]? 
+—+ 
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where + and — stand for 1 and —1 respectively, then A'A = 4/3. With this experi- 
mental design the individual weights may all be determined with twice the accuracy 
of the weighing procedure. 

The next result shows, in particular, that if we wish to maximize det(A‘ A) among 
the n x m matrices A with all entries 0, 1 or —1, then we may restrict attention to those 
with all entries 1 or —1. 


Proposition 8 Let a, £ be real numbers with a < f and let Y be the set of alln x m 
matrices A = (ajx) such thata < ajx < £ forall j,k. Then there exists ann x m 
matrix M = (jx) such that jx € {a, B} for all j,k and 


det(M’ M) = max det(A’ A). 
AES 
Proof For any n x m real matrix A, either the symmetric matrix A‘A is positive 
definite and det(A‘A) > 0, or A‘A is positive semidefinite and det(A’ A) = 0. Since 


the result is obvious if det(A’ A) = 0 for every A € -%, we assume that det(A‘ A) > 0 
for some A € .%. This implies m < n. Partition such an A in the form 


A= (vB), 


where v is the first column of A and B is the remainder. Then 


t t 
1, _|vo vB 
a'a=| on | 


and BB is also a positive definite symmetric matrix. By multiplying A‘ A on the left by 


I —v' B(B'B)7! 
O I 


and taking determinants, we see that 
det(A‘ A) = f(v) det(B'B), 
where 
foe) =v'0- vo’ B(B'B)~|B'v. 
We can write f(v) = v' Qv, where 
Q=I-—P, P=B(B'B)'B'. 
From P' = P = P? we obtain Q' = Q = Q?. Hence Q = Q'Q is a positive 
semidefinite symmetric matrix. 


If v = 6v, + (1 — @)v2, where v1 and v2 are fixed vectors and 6 € R, then f(v) is 
a quadratic polynomial g(@) in 6 whose leading coefficient 


Dj Qv, - v5 Qv| — vj Qv2 + v5 Qv2 


is nonnegative, since Q is positive semidefinite. It follows that g(@) attains its maxi- 
mum value in the interval 0 < 6 < 1 at an endpoint. 
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Put 


fe = sup det(A‘A). 
AES 


Since det(A‘A) is a continuous function of the mn variables aj; and . may be re- 
garded as a compact set in R’””, w is finite and there exists a matrix A € .Y for which 
det(A'A) = u. By repeatedly applying the argument of the preceding paragraph to 
this A we may replace it by one for which every entry in the first column is either a or 
B and for which also det(A’A) = w. These operations do not affect the submatrix B 
formed by the last m — 1 columns of A. By interchanging the k-th column of A with 
the first, which does not alter the value of det(A‘ A), we may apply the same argument 
to every other column of A. 


The proof of Proposition 8 actually shows that if C is a compact subset of IR” and 
if .Y is the set of all m x m matrices A whose columns are in C, then there exists an 
n xX m matrix M whose columns are extreme points of C such that 


det(M‘'M) = sup det(A‘ A). 
AES 


Here e € C is said to be an extreme point of C if there do not exist distinct 01, v2 € C 
and @ € (0, 1) such that e = 60; + (1 — @)v2. 

The preceding discussion concerns weighings by a chemical balance. If instead 
we use a spring balance, then we are similarly led to the problem of maximizing 
det(B’ B) among all n x m matrices B = (8 jx) with Bj, = 1 or 0 according as the k-th 
object is or is not involved in the j-th weighing. Moreover other types of measurement 
lead to the same problem. A spectrometer sorts electromagnetic radiation into bundles 
of rays, each bundle having a characteristic wavelength. Instead of measuring the 
intensity of each bundle separately, we can measure the intensity of various combi- 
nations of bundles by using masks with open or closed slots. 

It will now be shown that in the case m = n the chemical and spring balance 
problems are essentially equivalent. 


Lemma 9 /f B is an (n — 1) x (n — 1) matrix of 0’s and 1’s, and if Jy is then x n 
matrix whose entries are all 1, then 
O O 


A=h-[6 2B 


isann x n matrix of 1’s and —1’s, whose first row and column contain only 1’s, such 
that 


det A = (—2)"7! det B. 


Moreover, every n Xn matrix of 1’s and —\’s, whose first row and column contain only 
1’s, is obtained in this way. 
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—f 1 Ovfl ent 
s=[e, i][o Sa] 


where é,, denotes a row of m 1’s, the matrix A has determinant (=2)7-! det B. The 
rest of the lemma is obvious. 


Proof Since 


Let A be ann x n matrix with entries +1. By multiplying rows and columns of A 
by —1 we can make all elements in the first row and first column equal to | without 
altering the value of det(A’ A). It follows from Lemma 9 that if a, is the maximum 
of det(A'A) among all n x n matrices A = (ajx) with aj € {—1, 1}, and if By-1 
is the maximum of det(B'B) among all (n — 1) x (n — 1) matrices B = (jx) with 
Bik € {0, 1}, then 


2n—2 
Gan = 2 7 PBn-1- 
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In rectangular coordinates the equation of an ellipse with centre at the origin has the 
form 
QO := ax? + 2bxy + cy” = const. (*) 


This is not the form in which the equation of an ellipse is often written, because of the 
“cross product’ term 2bxy. However, we can bring it to that form by rotating the axes, 
so that the major axis of the ellipse lies along one coordinate axis and the minor axis 
along the other. This is possible because the major and minor axes are perpendicular 
to one another. These assertions will now be verified analytically. 

In matrix notation, Q = z' Az, where 


fb 6 


A rotation of coordinates has the form z = Tw, where 
T= cos@ —sind _ fu 
~Isin@ cos? |? 9 ~ Jo]: 


Then Q = w' Bw, where B = T‘AT. Multiplying out, we obtain 
ab 
B= E a ’ 


b' = b(cos’ @ — sin? 0) — (a — c) sin@ cos 0. 


where 


To eliminate the cross product term we choose @ so that b(cos?@ — sin?@) = 
(a — c) sin@ cos 9@; 1.e., 2b cos 26 = (a — c) sin 26, or 


tan 20 = 2b/(a—c). 
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The preceding argument applies equally well to a hyperbola, since it is also 
described by an equation of the form (*). We now wish to extend this result to higher 
dimensions. An n-dimensional conic with centre at the origin has the form 


O := x' Ax = const., 


where x € R” and A is ann x n real symmetric matrix. The analogue of a rotation is a 
linear transformation x = Ty which preserves Euclidean lengths, i.e. x'x = y’ y. This 
holds for all y € R” if and only if 


TT =I. 


A matrix T which satisfies this condition is said to be orthogonal. Then T' = T~! and 
hence also TT! = I. 

The single most important fact about real symmetric matrices is the principal axes 
transformation: 


Theorem 10 /f H is ann x n real symmetric matrix, then there exists ann x n real 
orthogonal matrix U such that U' HU is a diagonal matrix: 


U'HU = diag[A,,..., An]. 
Proof Let f : R” > R be the map defined by 
f@)=s' Hx. 
Since f is continuous and the unit sphere S = {x € R” : x'x = 1} is compact, 
A, := sup f(x) 
xeS 


is finite and there exists an x; € S such that f(x;) = 4). We are going to show that, if 
x € Sand x‘'x; = 0, then also x‘Hx, = 0. 
For any real ¢, put 


y = (x1 +ex)/U +62), 


Then also y € S, since x and x; are orthogonal vectors of unit length. Hence f(y) < 
F (x1), by the definition of x;. But x Hx = x'Hx,, since H is symmetric, and hence 


f(y) = (f@1) + 2ex' Hx) + &? f (x)}/ +"). 
For small |eé| it follows that 
f(y) = fa) + 2ex’ Hx, + Of”). 


If x‘ Hx, were different from zero, we could choose ¢ to have the same sign as it and 
obtain the contradiction f(y) > f(x1). 

On the intersection of the unit sphere S with the hyperplane x‘x; = 0, the function 
f attains its maximum value 42 at some point x2. Similarly, on the intersection of the 
unit sphere S with the (n—2)-dimensional subspace of all x such that x'x; = x'x2 = 0, 
the function f attains its maximum value /3 at some point x3. Proceeding in this way 
we obtain n mutually orthogonal unit vectors x1, ..., X,. Moreover xi x; = A; and, 
by the argument of the previous paragraph, xi x, = Oif 7 > k. It follows that the 
matrix U with columns x1, ..., x, satisfies all the requirements. 
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It should be noted that, if U is any orthogonal matrix such that U'HU = 
diag[11,..., An] then, since UU' = J, the columns x,,..., x, of U satisfy 


Ax; = A jXj (1 < J < n). 


That is, 2; is an eigenvalue of H and x; a corresponding eigenvector (1 < j <n). 

A real symmetric matrix A is positive definite if x'Ax > 0 for every real vector 
x # 0 (and positive semi-definite if x'Ax > 0 for every real vector x with equal- 
ity for some x ¥ 0). It follows from Theorem 10 that two real symmetric matrices 
can be simultaneously diagonalized, if one of them is positive definite, although the 
transforming matrix may not be orthogonal: 


Proposition 11 /f A and B aren xn real symmetric matrices, with A positive definite, 
then there exists ann x n nonsingular real matrix T such that T'AT and T' BT are 
both diagonal matrices. 


Proof By Theorem 10, there exists a real orthogonal matrix U such that U'AU isa 
diagonal matrix: 


U'AU = diag[A,,..., An]. 


Moreover, A; > 0 (1 < j <n), since A is positive definite. Hence there exists 6; > 0 
such that OF = 1/A;. If D = diag[o),..., dn], then D'U'AUD = I. By Theorem 10 
again, there exists a real orthogonal matrix V such that 


V'(D'U'BUD)V = diag[u1,..., Ln] 


is a diagonal matrix. Hence we can take T = UDV. 


Proposition 11 will now be used to obtain an inequality due to Fischer (1908): 


Proposition 12 [f G is a positive definite real symmetric matrix, and if 


is any partition of G, then 
det G < det G, - det G3, 
with equality if and only if G2 = 0. 


Proof Since G3 is also positive definite, we can write G = Q' H Q, where 


_ I 0 _ |X, 0 
elect, i *=[0 a) 


and H; = G; — G2G3'G5,. Since det G = det Hj - det G3, we need only show that 
det H; < det G1, with equality only if G2 = 0. 

Since G, and Hj are both positive definite, they can be simultaneously diagonal- 
ized. Thus, if G; and Hy are p x p matrices, there exists a nonsingular real matrix T 
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such that 
MGit Sdiapliipssis Vole TAT ]diagloi,s:250pl. 


Since Gy is positive definite, u’(G, — H,)u > 0 for any u € R?. Hence y; > 6; > 0 
fori = 1,..., p and detG, > det H;. Moreover det G; = det Hy only if yj = 6; for 
al ermerery 27 

Hence if det G; = det Hj, then Gj = Hj, ice. G2G;'G4 = 0. Thus w'G3!w =0 
for any vector w = G)p. Since w'Gy!w = 0 implies w = 0, it follows that 
G2 =0. 


From Proposition 12 we obtain by induction 
Proposition 13 If G = (yx) is anm x m positive definite real symmetric matrix, then 
det G < y11722°-* mm, 
with equality if and only if G is a diagonal matrix. 


By applying Proposition 13 to the matrix G = A‘ A, we obtain again Proposition 3. 
Proposition 13 may be sharpened in the following way: 


Proposition 14 If G = (y jx) is anm x m positive definite real symmetric matrix, then 
m 
det G < yn | [Qui —yi)/r), 
j=2 


with equality if and only if y jk = yijyik/yi1for2 < j <k <m. 


Proof If 
I og 
T= ‘ 
E Fal 
where g = (—y12/Y11,---,—Y1m/711), then 
t _—|yu 0 
rer=[2 8), 


where H = (njx) is an (m— 1) x (m — 1) positive definite real symmetric matrix with 
entries 


Nik =Vik—Viyik/yu A@sjck<m). 


Since det G = y1; det H, the result now follows from Proposition 13. 


Some further inequalities for the determinants of positive definite matrices will 
now be derived, which will be applied to Hadamard’s determinant problem in the next 
section. We again denote by J;, the m x m matrix whose entries are all 1. 
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Lemma 15 [fC = aly, + BJm for some real a, B, then 
detC = a"—!(a + mB). 


Moreover, if detC 4 0, then C7! = y In + OJm, where 6 = —Ba-'(a + mB)~! and 
-1 
y=a. 


Proof Subtract the first row of C from each of the remaining rows, and then add to the 
first column of the resulting matrix each of the remaining columns. These operations 
do not alter the determinant and replace C by an upper triangular matrix with main 
diagonal entries a + mf (once) and a (m — | times). Hence det C = a”~!(a + mf). 

If detC + 0 and if y, 6 are defined as in the statement of the lemma, then from 
J2 = mJy, it follows directly that 


m 


(alm + BIn)Qy In + OJm) = In. 


Proposition 16 Let G = (yjx) be anm x m positive definite real symmetric matrix 
such that |y jx| = B forall j,k and yj; < a+ forall j, where a, B > 0. Then 


detG < a” !(a + mB). (2) 


Moreover, equality holds if and only if there exists a diagonal matrix D, with main 
diagonal elements +1, such that 


DGD =aIn + BJm. 


Proof The result is trivial if m = 1| and is easily verified if m = 2. We assume m > 2 
and use induction on m. By replacing G by DGD, where D is a diagonal matrix 
whose main diagonal elements have absolute value 1, we may suppose that yj, > 0 
for 2 < k < m. Since the determinant is a linear function of its rows, we have 


detG = (yi11 —f)d +n, 


where 6 is the determinant of the matrix obtained from G by omitting the first row and 
column and 7 is the determinant of the matrix H obtained from G by replacing y11 
by f. By the induction hypothesis, 


5<a™"7(a +m — Bp). 
If 7 < 0, it follows that 
detG < a” (a + mB — B) <a" '(a+ mf). 


Thus we now suppose 7 > 0. Then H is positive definite, since the submatrix 
obtained by omitting the first row and column is positive definite. By Proposition 14, 


n<B| [Qi -yi)/B). 


j=2 
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with equality only if yjx = yijyir/P for2 < j < k < m.Hencey < a! B, with 
equality only if yjj =a+fhfor2 <j < mandyjy = fforl <j <k <m. 
Consequently 


detG < a” '(a+mB — B)+a"'B =a" (a +m), 


with equality only if G = aly + BJ. 


A square matrix will be called a signed permutation matrix if each row and column 
contains only one nonzero entry and this entry is | or —1. 


Proposition 17 Let G = (yjx) be anm x m positive definite real symmetric matrix 
such that yjj < «+8 for all j and either y jx = 0 or |yjx| = B for all j,k, where 
a, fp > 0. 

Suppose in addition that yiz = y jk = 0 implies yi; 4 0. Then 


detG < a” *(a + mB/2)° if m is even, 


3 
det G < a”-*(a + (m + 1)8/2)(a + (m — 1)8/2) ifm is odd. “ 


Moreover, equality holds if and only if there is a signed permutation matrix U such that 


face: [EO 
vou =[E 2] 


where 
L=M =alnj2+ BJmj2 ifm is even, 
L=alims1y/2 + BI(m+1)/25M = alim—-1)/2 + BIm—y/2_ ifm is odd. 
Proof We are going to establish the inequality 
detG < a” -*(a + sB)(a + mB — sf), (4) 


where s is the maximum number of zero elements in any row of G. Since, as a function 
of the real variable s, the quadratic on the right of (4) attains its maximum value for 
s = m/2, and has the same value for s = (m+ 1)/2 as for s = (m — 1)/2, this will 
imply (3). It will also imply that if equality holds in (3), then s = m/2 if m is even and 
s = (m+ 1)/2 or (m — 1)/2 if m is odd. 

For m = 2 it is easily verified that (4) holds. We assume m > 2 and use induction. 
By performing the same signed permutation on rows and columns, we may suppose 
that the second row of G has the maximum number s of zero elements, and that all 
nonzero elements of the first row are positive and precede the zero elements. All the 
hypotheses of the proposition remain satisfied by the matrix G after this operation. 

Let s’ be the number of zero elements in the first row and put r’ = m — s’. As in 
the proof of Proposition 16, we have 


detG = (y11 — B)O+n, 


where 6 is the determinant of the matrix obtained from G by omitting the first row and 
column and 7 is the determinant of the matrix H obtained from G by replacing y11 by 
f. We partition H in the form 
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LN 
#=L i 


where L, M are square matrices of orders r’, s’ respectively. By construction all ele- 
ments in the first row of L are positive and all elements in the first row of N are zero. 
Furthermore, by the hypotheses of the proposition, all elements of M have absolute 
value > f. 

By the induction hypothesis, 


d<a™ (a+ sp)(a+mB — B - sf). 


If 7 < 0, it follows immediately that (4) holds with strict inequality. Thus we now 
suppose 7 > 0. Then H is positive definite and hence, by Fischer’s inequality (Propo- 
sition 12), 7 < det L - det M, with equality only if N = 0. But, by Proposition 14, 


! 
detL < B] (ij —77)/8) sa" |B 
j=2 
and, by Proposition 16, 
det M < a’ "(a +s'B). 
Hence 
detG <a (a + sf)(a + mp — B — sp) +a" Bla +s'B), 


Since s’ < 5, it follows that (4) holds and actually with strict inequality if s’ # s. 
If equality holds in (4) then, by Proposition 14, we must have L = al, + BJy, 
and by Proposition 16 after normalization we must also have M = aly + BJs’. 


5 Application to Hadamard’s Determinant Problem 


We have seen that, if A is ann x m real matrix with all entries +1, then det(A‘ A) <n", 
with strict inequality if nm > 2 and n is not divisible by 4. The question arises, what 
is the maximum value of det(A’ A) in such a case? In the present section we use the 
results of the previous section to obtain some answers to this question. We consider 
first the case where n is odd. 


Proposition 18 Let A = (a;x) be ann x m matrix with a jx = £1 for all j,k. If n is 
odd, then 


det(A’A) < (n—1)""!'(n-1+4™m). 


Moreover, equality holds if and only ifn = 1 mod4 and, after changing the signs of 
some columns of A, 


A‘A = (n = 1)In tf Jin. 
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Proof We may assume det(A‘A) 4 0 and thus m < n. Then A'A = G = (yjx) isa 
positive definite real symmetric matrix. For all j,k, 


Vik = O1jOik ++++ + GAnjOnk 


is an integer and y;; = n. Moreover y jx is odd for all j,k, being the sum of an odd 
number of +1’s. Hence the matrix G satisfies the hypotheses of Proposition 16 with 
a =n-—1andf = 1. Everything now follows from Proposition 16, except for the 
remark that if equality holds we must have n = | mod 4. 

But if G = (n — 1)Im + Jm, then yjx = 1 for j A k. It now follows, by the 
argument used in the proof of Proposition 5, that any two distinct columns of A have 
the same entries in exactly (n + 1)/2 rows, and any three distinct columns of A have 
the same entries in exactly (n + 3)/4 rows. Thus n = 1 mod 4. 


Even if nm = 1 mod4 there is no guarantee that that the upper bound in Propo- 
sition 18 is attained. However the question may be reduced to the existence of 
H-matrices if m 4 n. For suppose m < n — | and there exists an (n — 1) x m 
H-matrix B. If we put 


where é€,, again denotes a row of m1’s, then A’A = (n — 1) In + Jin. 
On the other hand if m = n, then equality in Proposition 18 can hold only under 
very restrictive conditions. For in this case 


(det A)* = det A'A = (n — 1)""!(2n — 1) 


and, since n is odd, it follows that 2n — | is the square of an integer. It is an open 
question whether the upper bound in Proposition 18 is always attained when m = n 
and 2n — | is a square. However the nature of an extremal matrix, if one exists, can be 
specified rather precisely: 


Proposition 19 [f A = (a jx) is ann x n matrix with n > | odd and ajx = £1 for all 
Jk, then 


det(A’A) < (n—1)""!(2n - 1). 


Moreover if equality holds, thenn = 1 mod4, 2n — 1 = s* for some integer s and, 
after changing the signs of some rows and columns of A, the matrix A must satisfy 


=(n—-1)In+ Jn, AJn =S8JIp. 


Proof By Proposition 18 and the preceding remarks, it only remains to show that if 
there exists an A such that A‘A = (n — 1), + Jy, then, by changing the signs of some 
rows, we can ensure that also AJ, = sJn. 

Since det(AA’) = det(A‘A), it follows from Proposition 18 that there exists a 
diagonal matrix D with D* = I, such that 


DAA'D = (n—1)In+ Jn = A‘A. 
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Replacing A by DA, we obtain AA’ = A‘A. Then A commutes with A‘A and 
hence also with J,,. Thus the rows and columns of A all have the same sum s and 
AJn =8Jn = A' Jn. Moreover s* = 2n — 1, since 


8° Jn = 8A In = ATATy = (20 — 1) Jn. 


The maximum value of det(A‘A) when n = 3 mod 4 is still a bit of a mystery. We 
now consider the remaining case when n is even, but not divisible by 4. 


Proposition 20 Let A = (ajx) be ann x m matrix with2 < m <nandaj, = +1 
forall j,k. Ifn =2mod4 andn > 2, then 


det(A‘A) < (n —2)"-?(n —2 +m)? if m is even, 
det(A’A) < (n—2)"-2(n —1+m)(n—3+4m)_ ifm is odd. 


Moreover, equality holds if and only if there is a signed permutation matrix U such that 


tat _|L 0 
u'atau=(5 ae 


where 


L=M = (n— 2)Imj2 + 2Jmj2 ifm is even, 
L=(n-2)lmsy/2 + 2Jm4y/2,M = (2 -— 2)m-1)/2 + 2Im—-/2_ ifm is odd. 


Proof We need only show that G = A'A satisfies the hypotheses of Proposition 17 
with a = n —2 and £ = 2. We certainly have y;; = n. Moreover all y jx are even, 
since n is even and 


Vik = O1jO1k H+ + OnjOnk.- 
Hence |y jx] => 2 if yjx A 0. Finally, if 7, k, @, are all different and y ;¢ = yxe = 0, then 


n 
> Gy + Gik)(Qij + Ac) =N + Y jz. 


i=1 


Since n = 2 mod 4, it follows that also yj, = 2 mod4 and thus yj, 4 0. 


Again there is no guarantee that the upper bound in Proposition 20 is attained. 
However the question may be reduced to the existence of H-matrices ifm #n,n—1. 
For suppose m < n — 2 and there exists an (n — 2) x m H-matrix B. If we put 


where 
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andr +s =™m, then 


b, (@=DR +2 0 
a'a=| 0 (e= Dip OI. |" 


Thus the upper bound in Proposition 20 is attained by taking r = s = m/2 when m is 
even andr = (m + 1)/2, 5s = (m — 1)/2 when m is odd. 
Suppose now that m = n and 


1, _|L 0 
wse[i fh 
where L = (n — 2)Inj2 + 2Jn/2. If B is the n x (n — 1) submatrix of A obtained by 
omitting the last column, then 
tp _|L 0 
ciecion E vat 


where M = (n — 2)Inj2-1 + 2Jn/2-1. Thus if the upper bound in Proposition 20 is 
attained for m = n, then it is also attained for m = n — 1. Furthermore, since 


det(A A‘) = det(A‘A), 

it follows from Proposition 20 that there exists a signed permutation matrix U such that 

UAA'U = ACA. 
Replacing A by UA, we obtain AA‘ = A‘A. Then A commutes with A‘ A. If 

xX Y 

s=[2 wl) 
is the partition of A into square submatrices of order n/2, it follows that X, Y, Z, W 
all commute with L and hence with J;,/2. This means that the entries in any row or 
any column of X have the same sum, which we will denote by x. Similarly the entries 


in any row or any column of Y, Z, W have the same sum, which will be denoted by 
y, Z, w respectively. We may assume x, y, w > O by replacing A by 


Tn/2 0 A tIn/2 0 
0 tInj2 0 tInj2 , 


We have 

MX+Z7'Z=Y'VY+Wwetl, X'Y+Z'w=0, 
and 

XX'+YY'=ZZ'+WW'=L, XZ'+YW'=0. 
Postmultiplying by J, we obtain 


r+7=y+w%?=2n-2, xy+zw=0, 
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and 
er4+yar4+w?=2n-2, xztyw=0. 


Adding, we obtain x? = w? and hence x = w. Thus z? = y? and actually z = —y, 
since xy + zw = 0. 

This shows, in particular, that if the upper bound in Proposition 20 is attained for 
m =n = 2mod4, then 2n — 2 = x* + y’, where x and y are integers. By Proposi- 
tion II.39, such a representation is possible if and only if, for every prime p = 3 mod 4, 
the highest power of p which divides n — | is even. Hence the upper bound in Propo- 
sition 20 is never attained if m = n = 22. On the other hand if m = n = 6, then 
2n — 2 = 10 = 9+ 1 and an extremal matrix A is obtained by taking W = X = Jj 
and Z = —Y = 2h — Jh. 

It is an open question whether the upper bound in Proposition 20 is always 
attained when m = n and 2n — 2 is a sum of two squares. It is also unknown if, 
when an extremal matrix exists, one can always take W = X and Z = —Y. 


6 Designs 


A design (in the most general sense) is a pair (P, #), where P is a finite set of ele- 
ments, called points, and F is a collection of subsets of P, called blocks. If pi,..., Po 
are the points of the design and B,,..., By, the blocks, then the incidence matrix of 
the design is the v x b matrix A = (a;;) of 0’s and 1’s defined by 


oe 1 if pj € Bj, 
7" 10. if pi ¢ Bj. 


Conversely, any v x b matrix A = (a;;) of 0’s and 1’s defines in this way a design. 
However, two such matrices define the same design if one can be obtained from the 
other by permutations of the rows and columns. 

We will be interested in designs with rather more structure. A 2-design or, espe- 
cially in older literature, a ‘balanced incomplete block design’ (B/BD) is a design, with 
more than one point and more than one block, in which each block contains the same 
number k of points, each point belongs to the same number r of blocks, and every pair 
of distinct points occurs in the same number 4 of blocks. 

Thus each column of the incidence matrix contains k 1’s and each row contains 
r 1’s. Counting the total number of 1’s in two ways, by columns and by rows, we obtain 


bk =or. 


Similarly, by counting in two ways the 1|’s which lie below the 1’s in the first row, we 
obtain 


r(k-—1) =A - 1). 


Thus if v, k, A are given, then r and b are determined and we may speak of a 2-(0, k, A) 
design. Since v > 1 and b > 1, we have 


l<k<v, 1l<d<r. 
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Fig. 1. The Fano plane. 


Av x b matrix A = (aj;;) of 0’s and 1’s is the incidence matrix of a 2-design if 
and only if, for some positive integers k, r, , 


v b 


dai =k, Yak =r > Ginajye =4 ifi Aj (1 <i,j <p), 


i=l k=1 
or in other words, 
ey,A=kepy, AAT =(r—A)L, tA, (5) 


where e, is the 1 x n matrix with all entries 1, J, is the n x n unit matrix and J, is the 
n X n matrix with all entries 1. 

Designs have been used extensively in the design of agricultural and other experi- 
ments. To compare the yield of v varieties of a crop on b blocks of land, it would be 
expensive to test each variety separately on each block. Instead we can divide each 
block into k plots and use a 2-(v, k, 2) design, where 2 = bk(k — 1)/v(v — 1). Then 
each variety is used exactly r = bk /v times, no variety is used more than once in any 
block, and any two varieties are used together in exactly 1 blocks. As an example, take 
v=4,b=6,k =2 and hence A = 1, r = 3. 

Some examples of 2-designs are the finite projective planes. In fact a projective 
plane of order n may be defined as a 2-(v, k, 2) design with 


p=n*+n+1, k=n+1, A=1. 


It follows that b = v andr = k. The blocks in this case are called ‘lines’. The projec- 
tive plane of order 2, or Fano plane, is illustrated in Figure 1. There are seven points 
and seven blocks, the blocks being the six triples of collinear points and the triple of 
points on the circle. 

Consider now an arbitrary 2-(v, k, 2) design. By (5) and Lemma 15, 


det(AA’) = (r — A)?" !(r —A+ Av) > 0, 


since r > A. This implies the inequality b > v, due to Fisher (1940), since AA‘ would 
be singular if b < vb. 
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A 2-design is said to be square or (more commonly, but misleadingly) ‘symmetric’ 
if b = v, i.e. if the number of blocks is the same as the number of points. Thus any 
projective plane is a square 2-design. 

For a square 2-(v, k, 2) design, k = r and the incidence matrix A is itself nonsin- 
gular. The first relation (5) is now equivalent to J, A = kJ,. Since k = r, the sum of 
the entries in any row of A is also k and thus J, A' = kJ,. By multiplying the second 
relation (5) on the left by A~! and on the right by A, we further obtain 


AAR — Dh Ad. 


Thus A’ is also the incidence matrix of a square 2-(v,k, 1) design, the dual of the 
given design. 
This partly combinatorial argument may be replaced by a more general matrix one: 


Lemma 21 Let a,b,k be real numbers and n > | an integer. There exists a nonsin- 
gular real n x n matrix A such that 


AA‘ =al+bJ, JA=kJ, (6) 


if and only ifa > 0,a+bn > 0 and k? = a+ bn. Moreover any such matrix A also 
satisfies 
AA=al+bJ, JA‘ =kJ. (7) 


Proof We show first that if A is any real n x n matrix satisfying (6), then a+ bn = k?. 
In fact, since J? = nJ, the first relation in (6) implies JAA!’ J = (a+bn)nJ, whereas 
the second implies JAA'J = k?nJ. 

We show next that the symmetric matrix G := al + bJ is positive definite if and 
only ifa > Oanda+bn > 0. By Lemma 15, det G = a”~!(a + bn). If G is positive 
definite, its determinant is positive. Since all principal submatrices are also positive 
definite, we must have ai(a + bi) > Oforl <i < n. In particular,a +b > 0, 
a(a + 2b) > 0, which is only possible if a > 0. It now follows that also a + bn > 0. 

Conversely, suppose a > 0 anda+bn > 0. Then det G > 0 and there exist nonzero 
real numbers h, k such that a = h?, a + bn = k?. If we putC =hI + (k —h)n7!J, 
then JC =kJ and 


C? =f? + (2h(k—h) + (k—h)y*}n J =al +bI =G. 


Since det G > 0, this shows that G = CC‘ is positive definite and C is nonsingular. 

Finally, let A be any nonsingular real n x n matrix satisfying (6). Since A is nonsin- 
gular, AA’ is a positive definite symmetric matrix and hence a > 0,a+bn > 0. Since 
AA! = C? and C' = C, we have A = CU, where U is orthogonal. Hence A’ = U'C 
and C = UA‘. From JC = kJ we obtainkJ = JA = JCU =kJU. Thus J = JU 
and JA! = JUA' = JC =kJ. Moreover U' JU = J, since J‘ = J, and hence 


AA=U'CU =U (al +bJ)U =al +b. 


In Chapter VII we will derive necessary and sufficient conditions for the existence 
of a nonsingular rational n x n matrix A such that AA‘ = al+ bJ, and thus in particular 
obtain some basic restrictions on the parameters v,k, 2 for the existence of a square 
2-(v,k, 4) design. These were first obtained by Bruck, Ryser and Chowla (1949/50). 
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We now consider the relationship between designs and Hadamard’s determinant 
problem. By passing from A to B = (J, — A‘)/2, it may be seen immediately that 
equality holds in Proposition 19 if and only if there exists a 2-(n, k, A) design, where 
k =(n—s)/2,A =(n+1—2s)/4 and s? = 2n—1. 

We now show that with any Hadamard matrix A = (ax) of order n = 4d there is 
associated a 2-(4d — 1, 2d — 1, d — 1) design. Assume without loss of generality that 
all elements in the first row and column of A are 1. We take P = {2,...,n} as the set 
of points and 4 = {Bo,..., By} as the set of blocks, where By = {j € P: ajx = 1}. 
Then Bx has cardinality |B, | = n/2—1 fork = 2,...,n. Moreover, if T is any subset 
of P with |7| = 2, then the number of blocks containing T is n/4 — 1. The argument 
may also be reversed to show that any 2-(4d — 1, 2d — 1, d — 1) design is associated 
in this way with a Hadamard matrix of order 4d. 

In particular, for d = 2, the 2-(7, 3, 1) design associated with the Hadamard matrix 
Hz ® H2 ® Ho, where 

1 1 
Ay = i 4 , 


is the projective plane of order 2 (Fano plane) illustrated in Figure 1. 
The connection between Hadamard matrices and designs may also be derived by a 
matrix argument. If 


is a Hadamard matrix of order n = Ad , normalized so that its first row and column 
contain only 1’s, then B = (J,_; + A)/2 is a matrix of 0’s and 1’s such that 


J4q—1B = (2d -1)Jag-1, BB! = dlag—1 + (d — 1) Jaa-1. 


The optimal spring balance design of order 4d — 1, which is obtained by taking 
C = (Jn—1 — A)/2, 1s a 2-(4d — 1, 2d, d) design, since 


Jag aC = Wg, CC Sdlag 4 diag: 


The notion of 2-design will now be generalized. Let t, v, k, 1 be positive integers 
with v > k >t. At-(v,k, 4) design, or simply a t-design, is a pair (P, B), where P 
is a set of cardinality v and ZF is a collection of subsets of P, each of cardinality k, 
such that any subset of P of cardinality ¢ is contained in exactly 2 elements of Z. The 
elements of P will be called points and the elements of & will be called blocks. A t- 
(v, k, 2) design with 2 = 1 is known as a Steiner system. The automorphism group of 
a t-design is the group of all permutations of the points which map blocks onto blocks. 

If t = 1, then each point is contained in exactly 4 blocks and so the number of 
blocks is Av/k. Suppose now that t > 1. Let S be a fixed subset of P of cardinality 
t — 1 and let 2’ be the number of blocks which contain S. Consider the number of pairs 
(T,B), where B € BS CT C B and |T| =. By first fixing B and varying T we 
see that this number is 2’(k — t + 1). On the other hand, by first fixing T and varying 
B we see that this number is A(v — t + 1). Hence 
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does not depend on the choice of § and a t-(v,k, 2) design (P, #) is also a (t — 1)- 
(v,k, A’) design. By repeating this argument, we see that each point is contained in 
exactly r blocks, where 


p= teat? hero) eats Ys =A, 


and the total number of blocks is b = rv/k. In particular, any t-design with t > 2 is 
also a 2-design. 

With any Hadamard matrix A = (aj;x) of order n = 4d there is, in addition, 
associated a 3-(4d,2d,d — 1) design. For assume without loss of generality that all 


elements in the first column of A are 1. We take P = {1,2,...,} as the set of points 
and {Bo,..., Bn, B4,..., Bi} as the set of blocks, where B, = {j € P: ajx = 1} 
and B, = {j € P: ajx = —1}. Then, by Proposition 5, |Bx| = |B;| = n/2 for 


k =2,...,n.If T is any subset of P with |7| = 3, say T = {i, j, €}, then the number 
of blocks containing T is the number of k > 1| such that aj, = ajx = aex. But, by 
Proposition 5 again, the number of columns of A which have the same entries in rows 
i, j, € is n/4 and this includes the first column. Hence T is contained in exactly n/4— 1 
blocks. Again the argument may be reversed to show that any 3-(4d, 2d, d — 1) design 
is associated in this way with a Hadamard matrix of order 4d. 
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A group is said to be simple if it contains more than one element and has no nor- 
mal subgroups besides itself and the subgroup containing only the identity element. 
The finite simple groups are in some sense the building blocks from which all finite 
groups are constructed. There are several infinite families of them: the cyclic groups 
C, of prime order p, the alternating groups .%, of all even permutations of n objects 
(n > 5), the groups PSL» (q) derived from the general linear groups of all invertible 
linear transformations of an n-dimensional vector space over a finite field of g = p” 
elements (n > 2 andq > 3 ifn = 2), and some other families similar to the last which 
are analogues for a finite field of the simple Lie groups. 

In addition to these infinite families there are 26 sporadic finite simple groups. 
(The classification theorem states that there are no other finite simple groups besides 
those already mentioned. The proof of the classification theorem at present occupies 
thousands of pages, scattered over a variety of journals, and some parts are actually 
still unpublished.) All except five of the sporadic groups were found in the years 
1965-1981. However, the first five were found by Mathieu (1861,1873): Mj2 is a 
5-fold transitive group of permutations of 12 objects of order 12-11-10-9-8 and 
Mj, the subgroup of all permutations in M)2 which fix one of the objects; M24 is a 
5-fold transitive group of permutations of 24 objects of order 24 - 23 - 22-21 -20- 48, 
Mp3 the subgroup of all permutations in M24 which fix one of the objects and M22 the 
subgroup of all permutations which fix two of the objects. The Mathieu groups may 
be defined in several ways, but the definitions by means of Hadamard matrices that we 
are going to give are certainly competitive with the others. 
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Two n x n Hadamard matrices H), H2 are said to be equivalent if one may be 
obtained from the other by interchanging two rows or two columns, or by changing 
the sign of a row or a column, or by any finite number of such operations. Otherwise 
expressed, Hz = PH Q, where P and Q are signed permutation matrices. An auto- 
morphism of a Hadamard matrix H is an equivalence of H with itself: H = PHQ. 
Since P = HQ7'H™|, the automorphism is uniquely determined by Q. Under 
matrix multiplication all admissible Q form a group Y, the automorphism group of 
the Hadamard matrix H. Evidently —/ € Y and —/ commutes with all elements of 
¢Y. The factor group Y/{+/}, obtained by identifying Q and —Q, may be called the 
reduced automorphism group of H. 

To illustrate these concepts we will show that all Hadamard matrices of order 12 
are equivalent. In fact rather more is true: 


Proposition 22 Any Hadamard matrix of order 12 may be brought to the form 


+++ +44 +44 +44 


+++ +44 --- --- 
+44 --- +44 --- 
+--+ -4+- -4+- 4-4 
++—- --+ --+ +-+4 
-—++ 4+-- +-- +--+ (*) 
+--+ --+ 4+-- +4+- 
+--+ +-- --+ -++4 
+4+—- -4+- +-- -++4 
—++ -+- --+ +4+- 
+4+- +-- -4+- +4+- 
—++ --+ -4+- -++4 


(where + stands for 1 and — for —1) by changing the signs of some rows and columns, 
by permuting the columns, and by permuting the first three rows and the last seven 
rows. 


Proof Let A = (ajx) be a Hadamard matrix of order 12. By changing the signs of 
some columns we may assume that all elements of the first row are +1. Then, by the 
orthogonality relations, half the elements of any other row are +1. By permuting the 
columns we may assume that all elements in the first half of the second row are +1. It 
now follows from the orthogonality relations that in any row after the second the sum 
of all elements in each half is zero. Hence, by permuting the columns within each half 
we may assume that the third row is the same as the third row of the array (*) displayed 
above. In the r-th row, where r > 3, let px, be the sum of the entries in the k-th block 
of three columns (k = 1, 2,3, 4). The orthogonality relations now imply that 


Pl = pa = —p2 = —p3. 


In the s-th row, where s > 3 ands ¥ 1, let og be the sum of the entries in the k-th 
block of three columns. Then also 


0| = 04 = —02 = — 03. 
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If p, = +3, then all elements of the same triple of columns in the r-th row have the 
same sign and orthogonality to the s-th row implies oj = 0, which is impossible be- 
cause 0; is odd. Hence p; = +1. By changing the signs of some rows we may assume 
that p; = 1 foreveryr > 3. By permuting columns within each block of three we may 
also normalize the 4-th row, so that the first four rows are now the same as the first four 
rows of the array (*). 

In any row after the third, within a given block of three columns two elements have 
the same sign and the third element the opposite sign. Moreover, these signs depend 
only on the block and not on the row, since pj = 1. The scalar product of the triples 
from two different rows belonging to the same block of columns is 3 if the exceptional 
elements have the same position in the triple and is —1 otherwise. Since the two rows 
are orthogonal, the exceptional elements must have the same position in exactly one 
of the four blocks of columns. Thus if two rows after the 4-th have the same triple of 
elements in the k-th block as the 4-th row, then they have no other triple in common 
with the 4-th row or with one another. But this implies that if one of the two rows is 
given, then the other is uniquely determined. Hence no other row besides these two 
has the same triple of elements in the k-th block as the 4-th row. Since there are eight 
rows after the 4-th, and since each has exactly one triple in common with the 4-th row, 
it follows that, for eachk e€ {1, 2, 3, 4}, exactly two of them have the same triple in the 
k-th block as the 4-th row. 

The first four rows are unaltered by the following operations: 


(i) interchange of the first and last columns of any triple of columns, 
(ii) interchange of the second and third triple of columns, and then interchange of the 
second and third rows, 
(iii) interchange of the first and fourth triple of columns, then interchange of the sec- 
ond and third rows and change of sign of these two rows, 
(iv) interchange of the second and fourth triple of columns and change of their signs, 
then interchange of the first and third rows. 


If we denote the elements of the r-th row (r > 4) by €|,..., €12, then we have 


g+44+3=1=0+414+ 42, 
f+é+é=-l=G7+64+, 
62 —¢5 —og +41 = 2. 


In particular in the 5-th row we have as52 — a55 — 45g + 45,11 = 2. Thus as52 and a5,11 
cannot both be —1 and by an operation (iii) we may assume that a52 = 1. Similarly 
a55 and asg cannot both be | and by an operation (ii) we may assume that asg = —1. 
Then a55 = a5,11 and by an operation (iv) we may assume that a55 = 45,11 = —1. By 
operations (i) we may finally assume that the 5-th row is the same as the 5-th row of 
the array (*). 

As we have already shown, exactly one row after the 5-th row has the same triple 
+ — + in the last block of columns as the 4-th and 5-th rows and this row must be 
the same as the 6-th row of the array (*). By permuting the last seven rows we may 
assume that this row is also the 6-th row of the given matrix, that the 7-th and 8-th rows 
have the same first triple of elements as the 4-th row, that the 9-th and 10-th rows have 
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the same second triple of elements as the 4-th row, and that the 11-th and 12-th rows 
have the same third triple of elements as the 4-th row. 

In any row after the 6-th we have, in addition to the relations displayed above, 
Cu = 1,¢10 + ¢12 = O and 


1-4-7 HO -— 5 — O38 = -— G6 — So = I. 


In the 7-th and 8-th rows we have ¢; = ¢3 = 1,é = —1, and hence ¢5 = ¢g = —1, 


4 = —C6 = —¢7 = So. Since the first six rows are still unaltered by an operation (ii), 
and also by interchanging the first and third columns of the last block, we may assume 
that a74 = —1, a7,19 = 1. The 7-th and 8-th rows are now uniquely determined and 


are the same as the 7-th and 8-th rows of the array (*). 
In any row after the 8-th we have 


2—-@—-7+e2 =2=H-4—-—o+ G0. 


In the 9-th and 10-th rows we have ¢5 = ¢1; = | and ¢4 = ¢ = —1. Hence 
@ = —g~ = 1,01 = 7 = —63 = —¢o, and finally ¢o = cio = —c12. Thus the 
9-th and 10-th rows are together uniquely determined and may be ordered so as to 
coincide with the corresponding rows of the array (*). Similarly the 11-th and 12-th 
rows are together uniquely determined and may be ordered so as to coincide with the 
corresponding rows of the displayed array. 


It follows from Proposition 22 that, for any five distinct rows of a Hadamard ma- 
trix of order 12, there exists exactly one pair of columns which either agree in all these 
rows or disagree in all these rows. Indeed, by permuting the rows we may arrange that 
the five given rows are the first five rows. Now, by Proposition 22, we may assume that 
the matrix has the form (*). But it is evident that in this case there is exactly one pair 
of columns which either agree or disagree in all the first five rows, namely the 10-th 
and 12-th columns. 

Hence a 5-(12, 6, 1) design is obtained by taking the points to be elements of the 
set P = {1,..., 12} and the blocks to be the 12 - 11 subsets Bj,, Bry with j,k € P 
and j # k, where 


Byr= tie P: Qij = Aik}, Bix = {ie P: aj Faix}. 


The Mathieu group M2 may be defined as the automorphism group of this design or 
as the reduced automorphism group of any Hadamard matrix of order 12. 

It is certainly not true in general that all Hadamard matrices of the same order n 
are equivalent. For example, there are 60 equivalence classes of Hadamard matrices of 
order 24. The Mathieu group M24 is connected with the Hadamard matrix of order 24 
which is constructed by Paley’s method, described in §2. The connection is not as 
immediate as for M12, but the ideas involved are of general significance, as we now 
explain. 

A sequence x = (€1,...,¢,) of n 0’s and 1’s may be regarded as a vector in the 
n-dimensional vector space V = F> over the field of two elements. If we define the 
weight |x| of the vector x to be the number of nonzero coordinates ¢;, then 


(i) |x| > O with equality if and only if x = 0, 
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(ii) |x + yl < [x] + Iyl. 


The vector space V acquires the structure of a metric space if we define the (Hamming) 
distance between the vectors x and y to be d(x, y) = |x — y|. 

A binary linear code is a subspace U of the vector space V. If U has dimension 
k, then a generator matrix for the code is a k x n matrix G whose rows form a basis 
for U. The automorphism group of the code is the group of all permutations of the n 
coordinates which map U onto itself. An [n, k, d]-binary code is one for which V has 
dimension n, U has dimension k and d is the least weight of any nonzero vector in U. 

There are useful connections between codes and designs. Corresponding to any 
design with incidence matrix A there is the binary linear code generated over F2 by 
the rows of A. Given a binary linear code U, on the other hand, a theorem of Assmus 
and Mattson (1969) provides conditions under which the nonzero vectors in U with 
minimum weight form the rows of the incidence matrix of a t-design. 

Suppose now that H is a Hadamard matrix of order n, normalized so that all el- 
ements in the first row are 1. Then A = (H + J,)/2 is a matrix of 0’s and 1’s with 
all elements in the first row 1. The code C(#) defined by the Hadamard matrix H is 
the subspace generated by the rows of A, considered as vectors in the n-dimensional 
vector space V = F4. 

In particular, take H = Hq to be the Hadamard matrix of order 24 formed by 
Paley’s construction: 


0 €23 
ee ores aL 
24 = Ing |e a | 


where QO = (qjx) with qj = 0 if j = k and otherwise = 1 or —1 according as j — k 
is or is not a square mod 23 (0 < j,k < 22). It may be shown that the extended binary 
Golay code G24 = C(H24) is a 12-dimensional subspace of eae that the minimum 
weight of any nonzero vector in G24 is 8, and that the sets of nonzero coordinates 
of the vectors x € G4 with |x| = 8 form the blocks of a 5-(24, 8, 1) design. The 
Mathieu group M24 may be defined as the automorphism group of this design or as the 
automorphism group of the code G24. 
Again, suppose that H”) is the Hadamard matrix of order n = 2’” defined by 


H™) = Hy) ®---@H» (m factors), 


1 1 
Ap = i et 


The first-order Reed-Muller code R(1, m) = C(H”) is an (m+ 1)-dimensional sub- 
space of IF and the minimum weight of any nonzero vector in R(1, m) is 2"! Tt may 
be mentioned that the 3-(2”, 2”~!, 2”-? — 1) design associated with the Hadamard 
matrix H‘”) has a simple geometrical interpretation. Its points are the points of 
m-dimensional affine space over the field of two elements, and its blocks are the 
hyperplanes of this space (not necessarily containing the origin). 

In electronic communication a message is sent as a sequence of ‘bits’ (an abbrevi- 
ation for binary digits), which may be realised physically by off or on and which may 


where 
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be denoted mathematically by 0 or 1. On account of noise the message received may 
differ slightly from that transmitted, and in some situations it is extremely important to 
detect and correct the errors. One way of doing so would be to send the same message 
many times, but it is an inefficient way. Instead suppose the message is composed of 
codewords of length n, taken from a subspace U of the vector space V = F. There are 
2* different codewords, where k is the dimension of U. If the minimum weight of any 
nonzero vector in U is d, then any two distinct codewords differ in at least d places. 
Hence if a codeword u ¢€ U is transmitted and the received vector v € V contains 
less than d/2 errors, then v will be closer to u than to any other codeword. Thus if we 
are confident that any transmitted codeword will contain less than d/2 errors, we can 
correct them all by replacing each received vector by the codeword nearest to it. 

The Golay code and the first-order Reed—Muller codes are of considerable practical 
importance in this connection. For the first-order Reed—Muller codes there is a fast al- 
gorithm for finding the nearest codeword to any received vector. Photographs of Mars 
taken by the Mariner 9 spacecraft were transmitted to Earth, using the code R(1, 5). 

Other error-correcting codes are used with compact discs to ensure high quality 
sound reproduction by eliminating imperfections due, for example, to dust particles. 
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Kowalewski [22] gives a useful traditional account of determinants. Muir [28] is a 
storehouse of information on special types of determinants; the early Japanese work is 
described in Mikami [27]. 

Another approach to determinants, based on the work of Grassmann (1844), should 
be mentioned here, as it provides easy access to their formal properties and is used in 
the theory of differential forms. If V is an n-dimensional vector space over a field F, 
then there exists an associative algebra E, of dimension 2” as a vector space over F, 
such that 


(a) VCE, 

(b) v? = 0 for everyv eV, 

(c) V generates EF, i.e. each element of E can be expressed as a sum of a scalar mul- 
tiple of the unit element | and of a finite number of products of elements of V. 


The associative algebra E, which is uniquely determined by these properties, is 
called the Grassmann algebra or exterior algebra of the vector space V. It is easily 
seen that any two products of n elements of V differ only by a scalar factor. Hence, for 
any linear transformation A: V > V, there exists d(A) € F such that 


(Av1) +--+ (Aon) = d(A)vo, +++ 0, =forallvi,...,0n € V. 


Evidently d(AB) = d(A)d(B) and in fact d(A) = det A, if we identify A with 
its matrix with respect to some fixed basis of V. This approach to determinants is 
developed in Bourbaki [6]; see also Barnabei et al. [4]. 

Dieudonné (1943) has extended the notion of determinant to matrices with entries 
from a division ring; see Artin [1] and Cohn [9]. For a very different method, see 
Gelfand and Retakh [13]. 
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Hadamard’s original paper of 1893 is reproduced in [16]. Surveys on Hadamard 
matrices have been given by Hedayat and Wallis [19], Seberry and Yamada [34], and 
Craigen and Wallis [11]. Weighing designs are treated in Raghavarao [31]. For appli- 
cations of Hadamard matrices to spectrometry, see Harwit and Sloane [18]. The proof 
of Proposition 8 is due to Shahriari [35]. 

Our proof of Theorem 10 is a pure existence proof. A more constructive approach 
was proposed by Jacobi (1846). If one applies to n x n matrices the method which we 
used for 2 x 2 matrices, one can annihilate a symmetric pair of off-diagonal entries. By 
choosing at each step an off-diagonal pair with maximum absolute value, one obtains 
a sequence of orthogonal transforms of the given symmetric matrix which converges 
to a diagonal matrix. 

Calculating the eigenvalues of a real symmetric matrix has important practical 
applications, e.g. to problems of small oscillations in dynamical systems. House- 
holder [21] and Golub and van Loan [14] give accounts of the various computational 
methods available. 

Gantmacher [12] and Horn and Johnson [20] give general treatments of matrix 
theory, including the inequalities of Hadamard and Fischer. Our discussion of the 
Hadamard determinant problem for matrices of order not divisible by 4 is mainly based 
on Wojtas [37]. Further references are given in Neubauer and Ratcliffe [29]. 

Results of Brouwer (1983) are used in [29] to show that the upper bound in Propo- 
sition 19 is attained for infinitely many values of n. It follows that the upper bound in 
Proposition 20, with m = n, is also attained for infinitely many values of n. For if the 
n Xn matrix A satisfies 


AA=(n-DInt dn, 


then the 2n x 2n matrix 


- A A 

a=[, 4A 
satisfies 

~~ |L O 

viel J] 


where L = 2A‘'A = (2n — 2)In + 2p. 

There are introductions to design theory in Ryser [33], Hall [17], and van Lint and 
Wilson [25]. For more detailed information, see Brouwer [7], Lander [23] and Beth 
et al. [5]. Applications of design theory are treated in Chapter XIII of [5]. 

We mention two interesting results which are proved in Chapter 16 of Hall [17]. 
Given positive integers v,k, A with A <k <0: 


(i) If k(A — 1) = A(v — 1) and if there exists av x v matrix A of rational numbers 
such that 


AAT =(kK-DI+AJ, 


then A may be chosen so that in addition JA = kJ. 
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(ii) If there exists av x b matrix A of integers such that 
AA =(k-AJI+F15,IJA=kJ, 


then every entry of A is either 0 or 1, and thus A is the incidence matrix of a square 
2-design. 


For introductions to the classification theorem for finite simple groups, see 
Aschbacher [2] and Gorenstein [15]. Detailed information about the finite simple 
groups is given in Conway et al. [10]. There is a remarkable connection between 
the largest sporadic simple group, nicknamed the ‘Monster’, and modular forms; see 
Ray [32]. 

Good introductions to coding theory are given by van Lint [24] and Pless [30]. 
MacWilliams and Sloane [26] is more comprehensive, but less up-to-date. Assmus and 
Mattson [3] is a useful survey article. Connections between codes, designs and graphs 
are treated in Cameron and van Lint [8]. The historical account in Thompson [36] 
recaptures the excitement of scientific discovery. 
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VI 


Hensel’s p-adic Numbers 


The ring Z of all integers has a very similar algebraic structure to the ring C[z] of 
all polynomials in one variable with complex coefficients. This similarity extends to 
their fields of fractions: the field Q of rational numbers and the field C(z) of rational 
functions in one variable with complex coefficients. Hensel (1899) had the bold idea 
of pushing this analogy even further. For any ¢ € C, the ring C[z] may be embedded in 
the ring C-[[z]] of all functions f(z) = >°,,.9 @n(z—C)" with complex coefficients a, 
which are holomorphic at ¢, and the field C(z) may be embedded in the field C- ((z)) of 
all functions f(z) = >) ,e7 @n(z — €)” with complex coefficients a, which are mero- 
morphic at ¢, i.e. a, A O for at most finitely many n < 0. Hensel constructed, for each 
prime p, aring Zp of all ‘p-adic integers’ >"... an pp", where ay € {0,1,..., p — 1}, 
and a field Q, of all ‘p-adic numbers’ >°,,-7 np”, where a, € {0,1,...,p — 1} 
and a, # 0 for at most finitely many n < 0. This led him to arithmetic analogues of 
various analytic results and even to analytic methods of proving them. Hensel’s idea 
of concentrating attention on one prime at a time has proved very fruitful for algebraic 
number theory. Furthermore, his methods enable the theory of algebraic numbers and 
the theory of algebraic functions of one variable to be developed completely in parallel. 

Hensel simply defined p-adic integers by their power series expansions. We will 
adopt a more general approach, due to Kurschak (1913), which is based on absolute 
values. 


1 Valued Fields 


Let F be an arbitrary field. An absolute value on F is a map ||: F — R with the 
following properties: 

(V1) |0| = 0, |a| > O foralla € F witha £0; 

(V2) |ab| = |a||b| for alla,b € F; 

(V3) |a+)| < |a|+ |b| foralla,be F. 

A field with an absolute value will be called simply a valued field. 


A non-archimedean absolute value on F is a map || : F — R with the properties 
(V1), (V2) and 


(V3) |a + b| < max(|a|, |b|) for alla,be F. 
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A non-archimedean absolute value is indeed an absolute value, since (V1) implies that 
(V3) is a strengthening of (V3). An absolute value is said to be archimedean if it is 
not non-archimedean. 

The inequality (V3) is usually referred to as the triangle inequality and (V3) as 
the ‘strong triangle’, or ultrametric, inequality. 

If F is a field with an absolute value ||, then the set of real numbers |a| for all 
nonzero a € F is clearly a subgroup of the multiplicative group of positive real num- 
bers. This subgroup will be called the value group of the valued field. 

Here are some examples to illustrate these definitions: 


(i) An arbitrary field F has a trivial non-archimedean absolute value defined by 
jO|=0, |jaj|=1 ifa¥o0. 
(ii) The ordinary absolute value 
jaj=a ifa>0, |jajJ=-a ifa <0, 


defines an archimedean absolute value on the field Q of rational numbers. We will 
denote this absolute value by | |x to avoid confusion with other absolute values on Q 
which will now be defined. 

If p is a fixed prime, any rational number a # O can be uniquely expressed 
in the form a = ep?m/n, where e = +1, v = vp(a) is an integer and m,n are 
relatively prime positive integers which are not divisible by p. It is easily verified that 
a non-archimedean absolute value is defined on Q by putting 


|O|p =9,  |alp = p P@) ifa £0. 
We call this the p-adic absolute value. 


(iti) Let F = K(t) be the field of all rational functions in one indeterminate with 
coefficients from some field K. Any rational function f # 0 can be uniquely 
expressed in the form f = g/h, where g and h are relatively prime polynomials 
with coefficients from K and h is monic (i.e., has leading coefficient 1). If we denote 
the degrees of g and h by 0(g) and O(A), then a non-archimedean absolute value is 
defined on F by putting, for a fixed g > 1, 


(Qlc=0; [flo=q?™ it fz, 


Other absolute values on F can be defined in the following way. If p € K[t] 
is a fixed irreducible polynomial, then any rational function f ¢ 0 can be uniquely 
expressed in the form f = p’g/h, where v = v,(f) is an integer, g and h are 
relatively prime polynomials with coefficients from K which are not divisible by p, 
and h is monic. It is easily verified that a non-archimedean absolute value is defined 
on F by putting, for a fixed g > 1, 


[0|,=0, |flp= gq O(P)oF) if f £0. 
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(iv) Let F = K((t)) be the field of all formal Laurent series f(t) = >) ),¢7 ant” 
with coefficients a, € K such that a, ¢ 0 for at most finitely many n < 0. A non- 
archimedean absolute value is defined on F' by putting, for a fixed g > 1, 


JO}=0, [fl=q° if f #0, 
where v(f) is the least integer n such that ay 4 0. 


(v) Let F = C¢((z)) denote the field of all complex-valued functions f(z) = 
SM nez, @n(Z—C)" which are meromorphic at ¢ € C. Any f € F whichis not identically 
zero can be uniquely expressed in the form f(z) = (z — €)’g(z), where v = v¢(f) is 
an integer, g is holomorphic at ¢ and g(¢) 4 0. A non-archimedean absolute value is 
defined on F by putting, for a fixed g > 1, 


Ie =0, Ifl=q "if f £0. 


It should be noted that in examples (111) and (iv) the restriction of the absolute value 
to the ground field K is the trivial absolute value, and the same holds in example (v) 
for the restriction of the absolute value to C. For all the absolute values considered in 
examples (iii)—(v) the value group is an infinite cyclic group. 

We now derive some simple properties common to all absolute values. The 
notation in the statement of the following lemma is a bit sloppy, since we use 
the same symbol to denote the unit elements of both F and R (as we have already 
done for the zero elements). 


Lemma 1 Jn any field F with an absolute value | | the following properties hold: 


G) [1] = 1, | — 1| = 1 and, more generally, |a| = 1 for every a € F which is a root 
of unity; 
(ii) | — a| = |a| for everya é€ F; 
(iti) |la| — |blloo < Ja — b| for alla,b € F, where | |g is the ordinary absolute value 
onR; 
(iv) |a~!| = |a|~! for every a € F witha £0. 


Proof By taking a = b = | in (V2) and using (V1), we obtain |1| = 1. If a” = 1 for 
some positive integer n, it now follows from (V2) that a = |a| satisfies a” = 1. Since 
a > 0, this implies a = 1. In particular, | — 1| = 1. Taking b = —1 in (V2), we now 
obtain (ii). 

Replacing a by a — b in (V3), we obtain 


la| — |b] < la — BI. 


Since a and b may be interchanged, by (ii), this implies (iii). Finally, if we take 
b = a7! in (V2) and use (i), we obtain (iv). 


It follows from Lemma 1 (i) that a finite field admits only the trivial absolute value. 

We show next how non-archimedean and archimedean absolute values may be dis- 
tinguished from one another. The notation in the statement of the following proposition 
is very sloppy, since we use the same symbol to denote both the positive integer n and 
the sum | + 1 + ----+ 1 (nm summands), although the latter may be 0 if the field has 
prime characteristic. 
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Proposition 2 Let F be a field with an absolute value | |. Then the following properties 
are equivalent: 


@) [2| < 1 
(11) |n| < 1 for every positive integer n; 
(iii) the absolute value | | is non-archimedean. 


Proof It is trivial that (iii) = (i). Suppose now that (i) holds. Then j2*| = |2\* <1 


for any positive integer k. An arbitrary positive integer n can be written to the base 2 
in the form 


n=ag+aj2+---+a,28, 

where a; € {0, 1} for alli < g anda, = 1. Then 
In| < lao] + lai] +---+ lag] < g +1. 

Now consider the powers nk. Since n < 28+!, we have n* < 2*(8+) and hence 

nk = by + bi2 +--+ + bp2", 
where b; € {0, 1} forall j <h, by = 1 andh < k(g +1). Thus 

Inf =|n*®| <ht+1<k(gt+l). 

Taking k-th roots and letting k > 00, we obtain |n| < 1, since k!/* = efe/k _, | 
and likewise (g + 1)!/* = eflos(s+))/* _, 1. Thus (i) => (ii). 


Suppose next that (ii) holds. Then, since the binomial coefficients are positive 
integers, 


kel =|e47)"|= 


de n 
k.n—k 
> (i) 
k=0 


n 
ky) ,,jn—k 
< oie” 
k=0 


< (n+ 1p", 


where p = max(|x|, |y|). Taking n-th roots and letting n > oo, we obtain |x +y| < p. 
Thus (ii) = (iii). 


It follows from Proposition 2 that for an archimedean absolute value the sequence 
(|n|) is unbounded, since |2*| > 00 as k > oo. Consequently, for any a,b € F with 
a # QO, there is a positive integer n such that |na| > |b|. The name ‘archimedean’ 
is used because of the analogy with the archimedean axiom of geometry. It follows 
also from Proposition 2 that any absolute value on a field of prime characteristic is 
non-archimedean, since there are only finitely many distinct values of |n|. 
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2 Equivalence 


If A, uw, @ are positive real numbers with a < 1, then 
A \% beeX* A m 
——}) + >> —+>7— = 1 
(5) (5) A+u A+p 


M+ ue > Atm). 


and hence 


It follows that if | | is an absolute value on a field F and if 0 < a < 1, then | |% is also 
an absolute value, since 


la + bl® < (la| + |b))% < lal* + |dI". 


Actually, if | | is anon-archimedean absolute value on a field F’, then it follows directly 
from the definition that, for any a > 0, | |* is also a non-archimedean absolute value 
on F.. However, if | | is an archimedean absolute value on F then, for all large a > 0, 
| | is not an absolute value on F.. For |2| > 1 and hence, if a > log 2/ log |2|, 


Jl+1[* > 2=]1[* + /1/*. 


Proposition 3 Let ||, and | |2 be absolute values on a field F such that \a\z < 1 for 
anya € F with |a|, < 1. If ||, is nontrivial, then there exists a real number p > 0 
such that 


lal2 = laly for everyae F. 


Proof By taking inverses we see that also |a|2 > 1 for any a € F with jal; > 1. 
Choose b € F with |b]; > 1. For any nonzero a € F we have |a|; = [bIj. where 


y = log lal; /log|b|1. 


Let m,n be integers with n > 0 such that m/n > y. Then |a|{ < |b|{’ and hence 
Ja” /b'"|; < 1. Therefore also |a”/b™|2 < 1 and by reversing the argument we obtain 


m/n > log |a|2/ log |b]2. 
Similarly if m’,n’ are integers with n’ > 0 such that m’/n' < y, then 
m'/n' < log |al2/ log |b|2. 
It follows that 
log |a|2/ log |blz = y = log |a|1/ log |d|1. 


Thus if we put p = log|b|2/ log |b], then p > O and |a|2 = lal’. This holds trivially 
also for a = 0. 
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Two absolute values, || and | |2, on a field F are said to be equivalent when, for 
anyae F, 


lal) <1 if and only if |al2 < 1. 


This implies that |a|; > 1 if and only if |a/2 > 1 and hence also that |a|; = 1 if and 
only if |a|2 = 1. Thus if one absolute value is trivial, so also is the other. It now follows 
from Proposition 3 that two absolute values, | |, and | |2, on a field F are equivalent if 
and only if there exists a real number p > 0 such that |a|2 = |a for everyae F. 

We have seen that the field Q of rational numbers admits the p-adic absolute 
values ||, in addition to the ordinary absolute value | |oo. These absolute values are 
all inequivalent since, if p and q are distinct primes, 


IPlp <1, Ipla=1, IPlo =p>l. 


It was first shown by Ostrowski (1918) that these are essentially the only absolute 
values on Q: 


Proposition 4 Every nontrivial absolute value || of the rational field Q is equivalent 
either to the ordinary absolute value ||. or to a p-adic absolute value || p for some 
prime p. 


Proof Let b,c be integers > 1. By writing c to the base b, we obtain 
c= Cmb™ + Cm—1b™ “| ae eee CO, 


where 0 < cj < b(j =0,...,m) and cm 4 0. Thenm < logc/ log b, since cm > 1. 
If we put “ = max; <q <p |d|, it follows from the triangle inequality that 


Icl < w(1 + logc/ log b){max(1, |b])}!98°/ 922. 
Taking c = a” we obtain, for any a > 1, 
Ja| < w/a + nloga/logb)!/"{max(1, |b] )}lo8a/ lose 
and hence, letting n > ov, 
lal < {max(1, |b] )}!98¢/ 1089, 


Suppose first that |a| > 1 for some a > 1. It follows that |b| > 1 for every b > 1 
and 


[py '/ tee > ali? 8, 
In fact, since a and b may now be interchanged, 
(py ee? = Ja|!/lo84 


Thus p = log |a|/loga is a positive real number independent of a > 1 and |a| = a?. 
It follows that |a| = |a|%, for every rational number a. Thus the absolute value is 
equivalent to the ordinary absolute value. 
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Suppose next that |a| < 1 for every a > 1 and so for every a € Z. Since the 
absolute value on Q is nontrivial, we must have |a| < 1 for some integer a ~ 0. The 
set M of all a € Z such that |a| < 1 is a proper ideal in Z and hence is generated by 
an integer p > 1. We will show that p must be a prime. Suppose p = bc, where b 
and c are positive integers. Since |b||c| = |p| < 1, we may assume without loss of 
generality that |b| < 1. Then b € M and thus b = pd for somed € Z. Hence cd = | 
and so c = 1. Thus p has no nontrivial factorization. 

Every rational number a 4 0 can be expressed in the form a = p’b/c, where v is 
an integer and b, c are integers not divisible by p. Hence |b] = |c| = 1 and |a| = |p|”. 
We can write |p| = p~”’, for some real number p > 0. Then |a| = p~’? = lal, and 
thus the absolute value is equivalent to the p-adic absolute value. 


Similarly, the absolute values on the field F = K(t) considered in example (iii) 
of §1 are all inequivalent and it may be shown that any nontrivial absolute value on F 
whose restriction to K is trivial is equivalent to one of these absolute values. 

In example (ii) of §1 we have made a specific choice in each class of equivalent 
absolute values. The choice which has been made ensures the validity of the product 
formula: for any nonzero a € Q, 


laloo | J lalp = 1, 
P 


where |a|, ¢ | for at most finitely many p. 

Similarly, in example (iii) of $1 the absolute values have been chosen so that, for 
any nonzero f € K(t), |floo Els \flp = 1, where |f|p 4 1 for at most finitely 
many p. 

The following approximation theorem, due to Artin and Whaples (1945), treats 
several absolute values simultaneously. For p-adic absolute values of the rational field 
Q the result also follows from the Chinese remainder theorem (Corollary II.38). 


Proposition 5 Let ||1,..., | |m be nontrivial pairwise inequivalent absolute values of 
an arbitrary field F and let x), ..., Xm be any elements of F. Then for each real e > 0 
there exists an x € F such that 


|x —xgle <e forl<k<m. 
Proof During the proof we will more than once use the fact that if f,(x) = 
x"(14+x")7!, then | f,(a)| 3 0 or 1 asin 3 00 according as |a| < 1 or |a| > 1. 
We show first that there exists ana € F such that 
jai > 1, lake <1 for2<k<m. 


Since | |; and | |2 are nontrivial and inequivalent, there exist b,c € F such that 


lbli <1, |bl2 = 1, 
lel) > 1, Jelz < 1. 
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If we put a = b~'c, then |a|; > 1, |al2 < 1. This proves the assertion for m = 2. We 
now assume m > 2 and use induction. Then there exist b, c € F such that 


lbli>1, |ble <1 forl <k <m, 
Ieli > 1, lelm < I. 


If [bln < 1 we can take a = b. If |b|;, = 1 we can take a = b"c for sufficiently large n. 
If |b|m > 1 we can take a = f,(b)c for sufficiently large n. 
Thus for eachi € {1,...,m} we can choose qa; € F so that 


laili> 1, |aile <1 forall k Ai. 
Then 


xX = x1 fn(ai) +++: + 4m fn(an) 


satisfies the requirements of the proposition for sufficiently large n. 


It follows from Proposition 5, that if | |1,..., || are nontrivial pairwise inequiv- 
alent absolute values of a field F’, then there exists ana € F such that |a|, > 1 (kK = 
1,...,m). Consequently the absolute values are multiplicatively independent, i.e. if 
P15+++5 Pm are nonnegative real numbers, not all zero, then for some nonzero a e€ F, 


lal! ++ alin # 1. 


3 Completions 


Any field F with an absolute value || has the structure of a metric space, with the 
metric 


d(a, b) = |a — DI, 


and thus has an associated topology. Since |a| < 1 if and only if a” + 0 asn > oo, 
it follows that two absolute values are equivalent if and only if the induced topologies 
are the same. 

When we use topological concepts in connection with valued fields we will always 
refer to the topology induced by the metric space structure. In this sense addition and 
multiplication are continuous operations, since 


|(a + b) — (ao + bo)| < |a — ao| + |b — bol, 
|ab — agbo| < la — ao||b| + laollb — dol. 
Inversion is also continuous at any point ag # 0, since if |a — ao| < |ao|/2 then 
Jao| < 2|a| and 


=I 


-1 =i, (=1 -2 
ja" — a) |= la —aollal~ lao|" < 2|ao|“la — aol. 


Thus a valued field is a topological field. 
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It will now be shown that the procedure by which Cantor extended the field of 
rational numbers to the field of real numbers can be generalized to any valued field. 

Let F be a field with an absolute value | |. A sequence (a,,) of elements of F is said 
to converge to an element a of F,, and a is said to be the limit of the sequence (a,), if 
for each real ¢ > O there is a corresponding positive integer N = N(e) such that 


la, —a| <e foralln>N. 


It is easily seen that the limit of a convergent sequence is uniquely determined. 
A sequence (ay) of elements of F is said to be a fundamental sequence if for each 
€ > 0 there is a corresponding positive integer N = N(e) such that 


|adm —a4)| <eé forallm,n>QN. 
Any convergent sequence is a fundamental sequence, since 
lam — Qn| < lam — al + lan — al, 


but the converse need not hold. However, any fundamental sequence is bounded since, 
ifm = N(1), then for n > m we have 


|an| < |am — an| + lam| < 1+ |am|. 


Thus |a,| < “ for alln, where w = max{|aj|,..., |@m—1|, 1 + lam|}- 

The preceding definitions are specializations of the definitions for an arbitrary met- 
ric space (cf. Chapter I, §4). We now take advantage of the algebraic structure of F’. Let 
A = (a,) and B = (b,) be two fundamental sequences. We write A = B if ay, = by 
for all n, and we define the sum and product of A and B to be the sequences 


A+ B=(an+bn), AB = (anbn). 


These are again fundamental sequences. For we can choose 4 > 1 so that |an| < “, 
|bn| < pw for all n and then choose a positive integer N so that 


lam — Qn| <€/2u,  |bm —bpl < ¢/2u forallm,n > N. 
It follows that, for all m,n > N, 
(4m + Bm) — (an + bn) < lam = an| + [bm = bn| < €/2u + &/2u < e, 
and similarly 
lambm — Anbn| < \am — An|\Om| + lanllbm — bal < (€/2) eu + (€/2p) eu = €. 


It is easily seen that the set ¥ of all fundamental sequences is a commutative ring 
with respect to these operations. The subset of all constant sequences (a), i.e. dy = a 
for all n, forms a field isomorphic to F'. Thus we may regard F as embedded in ¥. 

Let -V denote the subset of .F consisting of all sequences (a,) which converge 
to 0. Evidently -/ is a subring of F and actually an ideal, since any fundamental 
sequence is bounded. We will show that .% is even a maximal ideal. 
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Let (a,) be a fundamental sequence which is not in .%. Then there exists uw > 0 
such that |a,| > yw for infinitely many v. Since |ajy, — dy| < u/2 forall m,n > N, it 
follows that |a,| > «/2 for alln > N. Put b, = a? if a, 4 0, by = Oif a, = 0. 
Then (b,,) is a fundamental sequence since, for m,n > N, 


lbn — bnl = \(an _ 4m) /Am4@n|\ < Au an — An|. 


Since (1) — (bnan) € %, the ideal generated by (a,) and -W contains the constant 
sequence (1) and hence every sequence in .F. Since this holds for each sequence 
(an) € F\N, the ideal V is maximal. 

Consequently (see Chapter I, §8) the quotient F = ¥/./ is a field. Since (0) is 
the only constant sequence in .-”, by mapping each constant sequence into the coset 
of VW which contains it we obtain a field in F isomorphic to F. Thus we may regard 
F as embedded in F. 

It follows from Lemma 1(iii), and from the completeness of the field of real 
numbers, that |A| = limy—soo |a,| exists for any fundamental sequence A = (ay). 
Moreover, 


|A]>0, |AB|=|Al|B], [A+ Bl < |A|+]Bl. 


Furthermore |A| = 0 if and only if A € -%. It follows that |B] = |C| if B-—C €., 
since 


|B] < |B—C|+ |C|=|C| < |C— Bl + |B] = |BI. 


Thus we may consider | | as defined on F = .¥/./’, and it is then an absolute value 
on the field F which coincides with the original absolute value when restricted to the 
field F. 

If A = (a,) is a fundamental sequence, and if A,, is the constant sequence (ay), 
then |A — A,,| can be made arbitrarily small by taking m sufficiently large. It follows 
that F is dense in F, i.e. for any a € F and any ¢ > 0 there exists a € F such that 
la —a| <é. 

We show finally that F is complete as a metric space, i.e. every fundamental 
sequence of elements of F converges to an element of F. For let (a,) be a funda- 
mental sequence in F.. Since F is dense in F, for each n we can choose a, € F so that 
|Gn — Qy| < 1/n. Since 


lam — An| < |dm — Gm| + |Am — On| + |An — an, 


it follows that (a,) is also a fundamental sequence. Thus there exists a € F such that 
limy-s 00 [dn — a| = O. Since 


|@n — @| < |an — dn| + lan — |, 


we have also limp |@, — a| = 0. Thus the sequence (a,,) converges to a. 
Summing up, we have proved 


Proposition 6 [f F is a field with an absolute value ||, then there exists a field F 
containing F, with an absolute value || extending that of F, such that F is complete 
and F is dense in F. 
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It is easily seen that F is uniquely determined, up to an isomorphism which 
preserves the absolute value. The field F is called the completion of the valued field F. 
The density of F in F implies that the absolute value on the completion F is non- 
archimedean or archimedean according as the absolute value on F is non-archimedean 
or archimedean. 

It is easy to see that in example (iv) of $1 the valued field F = K ((t)) of all formal 
Laurent series is complete, i.e. it is its own completion. For let { f} be a fundamental 
sequence in F’. Given any positive integer N, there is a positive integer M = M(N) 
such that | ff — f\| < q7% for j,k > M. Thus we can write 


f° O= > Ant” + >» ar" for allk > M. 
n<N n>N 


If f®) = Ynez ant”, then limp soo |f — f| = 0. 

On the other hand, givenany f(t) = >°,¢7 ant” € K ((t)), we have |f%-f| > 0 
as k — oo, where f(t) = nck Ant" € K(t). It follows that K ((t)) is the com- 
pletion of the field K (t) of rational functions considered in example (iii) of §1, with 
the absolute value | |; corresponding to the irreducible polynomial p(t) = ¢ (for which 
0(p) = 1). 

The completion of the rational field Q with respect to the p-adic absolute value | |p 
will be denoted by Q,, and the elements of Q, will be called p-adic numbers. 

The completion of the rational field Q with respect to the ordinary absolute value 
| loo is of course the real field R. In §6 we will show that the only fields with a com- 
plete archimedean absolute value are the real field IR and the complex field C, and the 
absolute value has the form | [be forsome p > 0. Infact p < 1, since 2? < 1412 =2. 
Thus an arbitrary archimedean valued field is equivalent to a subfield of C with the 
usual absolute value. (Hence, for a field with an archimedean absolute value | |, |n| > 1 
for every integer n > 1 and |n| — oo asn — ov.) Since this case may be considered 
well-known, we will in the following devote our attention primarily to the peculiarities 
of non-archimedean valued fields. 

We will later be concerned with extending an absolute value on a field F to a field 
E which is a finite extension of F’. Since all that matters for some purposes is that E 
is a vector space over F’, it is useful to introduce the following definition. 

Let F be a field with an absolute value || and let E be a vector space over F. 
A norm on E is a map || || : E > R with the following properties: 


(i) |la|| > 0 for every a € E witha #0; 
(ii) ||aa|| = |a||la|| foralla e¢ Fandae E; 
(ii) lla + bl] < |la|| + |[b|| for alla, b € E. 


It follows from (ii) that || O || = 0. We will require only one result about normed vector 
spaces: 


Lemma 7 Let F be a complete valued field and let E be a finite-dimensional vector 
space over F. If || ||, and || ||2 are both norms on E, then there exist positive constants 
o, pt such that 


allalla < llall2 < wllalii foreveryae E. 
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Proof Let e1,...,@n be a basis for the vector space E. Then any a € E can be 
uniquely represented in the form 


a=ayei +-:++Onen, 
where @1,..., Gn € F. It is easily seen that 


la|lo = max |a;| 
l<i<n 


is anorm on E, and it is sufficient to prove the proposition for || ||2 = || ||o. Since 
lal < llallo(lerlla +--+ + Meni), 
we can take o = (le; ||) + --- + llen|l1)~!. To establish the existence of 4 we assume 
n > | and use induction, since the result is obviously true for n = 1. 
Assume, contrary to the assertion, that there exists a sequence a”) © E such that 
la Ih, < cella Ilo, 
where ¢,; > O and ex — Oas k — oo. We may suppose, without loss of generality, that 
lay | = la Ilo 
and also, by replacing a“) by (a)—a®, that a) = 1. Thusa® = b +e, where 


b®) = ae = ee Gt acts 


and |ja ||; + 0 as k > oo. The sequences aM =1,...,n — 1) are fundamental 
sequences in F’, since 


bP — bOI < oD + ently +b + ens = aM Ih + la Ih 
and, by the induction hypothesis, 
j k j F 
lay? = a1 < pn bY — bOI, G = 1... =D). 


Hence, since F is complete, there exist a; € F such that la? —aj| 7 0G@=1,..., 
n— 1). Put 


b= aye; +--+ + On—1€n-1. 


Since |b —b], < o,', |b —bllo, it follows that ||b —b||; > 0. Butifa = b+en, 
then 


lal, < la —a™ fy + Ja fy = lb — BO + a Ih. 


Letting k — oo, we obtain a = 0, which contradicts the definition of a. 
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4 Non-Archimedean Valued Fields 


Throughout this section we denote by F a field with a non-archimedean absolute value 
| |.A basic property of such fields is the following simple lemma. It may be interpreted 
as saying that in ultrametric geometry every triangle is isosceles. 


Lemma 8 /fa,b € F and |a| < |b|, then |a + b| = |D|. 
Proof We certainly have 

la + b| < max{|a], |b|} = [0]. 
On the other hand, since b = (a + b) — a, we have 


|b] < max{la + b], | — al} 


and, since | — a| = |a| < |b|, this implies |b| < |a+ DJ. 


It may be noted that if a 4 0 and b = —a, then |a| = |b| and |a + b| < |b|. From 
Lemma 8 it follows by induction that if aj,...,d, € F and |ax| < |aj| forl <k <n, 
then 


Jay +--+ +4n| = lai}. 


As an application we show that if a field E is a finite extension of a field F, then 
the trivial absolute value on E is the only extension to E of the trivial absolute value 
on F’. By Proposition 2, any extension to E of the trivial absolute value on F must be 
non-archimedean. Suppose a € E and |a| > 1. Then a satisfies a polynomial equation 


1 


a” + Cy-10"— +---+c9 =0 


with coefficients cx € F. Since |cx| = 0 or 1 and since |a*| < |a”| if k <n, we obtain 
the contradiction Ja”| = Ja” + cn_ja"~! +--+ co] = 0. 
As another application we prove 


Proposition 9 If a field F has a non-archimedean absolute value ||, then the 
valuation on F can be extended to the polynomial ring F[t] by defining the absolute 
value of f(t) = a9 tajt +--+ + ant” to be | f| = max{lao|,..., |an|}. 


Proof We need only show that | fg| = | f||g|, since it is evident that | f| = O if and 
only if f = 0 and that |f + g| < |f|+|g|. Let g() = bo + bit +---+ byt". Then 
f(g) =cotcit +--+ +crt', where 


cj = agbj + ay bj-1 +--+ + ajbo. 


If r is the least integer such that |a,| = |f| and s the least integer such that 
|bs| = |gl, then a;bs has strictly greatest absolute value among all products ajbx 
with j +k =r-+s. Hence |c;+5| = |a,||bs| and | fg| > | f||g|. On the other hand, 


| fg = max|er| < max|ajlibxl = [fle 


Consequently | fg| = |f||g|. Clearly also | f| = |a| if f = a e€ F. (The absolute 
value on F can be further extended to the field F(t) of rational functions by defining 


If (1)/g@)| to be | fI/Igl-) 
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It also follows at once from Lemma 8 that if a sequence (a,) of elements of F 
converges to a limit a # 0, then |a,| = |a| for all large n. Hence the value group of 
the field F is the same as the value group of its completion F. The next lemma has an 
especially appealing corollary. 


Lemma 10 Let F be a field with a non-archimedean absolute value ||. Then a 
sequence (an) of elements of F is a fundamental sequence if and only if 
imp oo |@n+1 — An| = 0. 


Proof If |an41 — a,| — O, then for each ¢ > O there is a corresponding positive 
integer N = N(e) such that 


lant41—-—aQn| <e forn >N. 
For any integer k > 1, 
An+k — An = (An41 — An) + (Gn42 — Angi) + +++ + (Gn+k — An+k-1) 
and hence 
lQntk — Qr| < max{|an+1 — an, |Qn42 — Qn4il,---, |Qn4k — An+k—1|} < ¢ forn > N. 


Thus (a,,) is a fundamental sequence. The converse follows at once from the definition 
of a fundamental sequence. 


Corollary 11 In a field F with a complete non-archimedean absolute value ||, an 
infinite series }-"°_, an of elements of F is convergent if and only if \an| > 0. 


Let F be a field with a nontrivial non-archimedean absolute value | | and put 


R={aeF:|a| < 1}, 
M={aeF:|a| < lI}, 
U = {ae F: |a| = 1}. 


Then R is the union of the disjoint nonempty subsets M and U. It follows from the de- 
finition of a non-archimedean absolute value that R is a (commutative) ring containing 
the unit element of F and that, for any nonzero a € F, eithera € R or a eR (or 
both). Moreover M is an ideal of R and U is a multiplicative group, consisting of all 
a € R such that also a~! € R. Thus a proper ideal of R cannot contain an element of 
U and hence M is the unique maximal ideal of R. Consequently (see again Chapter I, 
§8) the quotient R/M is a field. 

We call R the valuation ring, M the valuation ideal, and R/M the residue field of 
the valued field F. 

We draw attention to the fact that the ‘closed unit ball’ R is both open and closed 
in the topology induced by the absolute value. For if a € R and |b — a| < 1, then also 


be R. Furthermore, if a, € Randa, — athena é€ R, since |a,| = |a| for all large n. 
Similarly, the “open unit ball’ M is also both open and closed. 
In particular, let F = Q be the field of rational numbers and || = ||, the p-adic 


absolute value. In this case the valuation ring R = R, is the set of all rational numbers 
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m/n, where m and n are relatively prime integers, n > O and p does not divide n. The 
valuation ideal is M = pR, and the residue field F, = Rp/pRp is the finite field with 
p elements. 

As another example, let F = K(t) be the field of rational functions with 
coefficients from an arbitrary field K and let || = ||; be the absolute value 
considered in example (iii) of $1 for the irreducible polynomial p(t) = tf. In this 
case the valuation ring R is the set of all rational functions f = g/h, where g and h 
are relatively prime polynomials and / has nonzero constant term. The valuation ideal 
is M = fR and the residue field R/M is isomorphic to K, since f(t) = f(0)modM 
(ie., f(t) — f) € M). 

Let F be the completion of F. If R and M are the valuation ring and valuation 
ideal of F , then evidently 


R=ROAF, M=MOF. 


such that |a — a| <_¢ and thena € R a a —a € M). Furthermore the residue 
fields R/M and R/M are isomorphic. For the map a + M > a+ M(a € R) isan 
isomorphism of R/M onto a subfield of R/M and this subfield is not proper (by the 
preceding bracketed remark). 

The valuation ring of the field Q, of p-adic numbers will be denoted by Z, and 
its elements will be called p-adic integers. The ring Z of ordinary integers is dense in 
Zp, and the residue field of Q, is the finite field F, with p elements, since this is the 
residue field of Q. 

Similarly, the valuation ring of the field K ((t)) of all formal Laurent series is the 
ring K[[t]] of all formal power series 7.9 ant”. The polynomial ring K [rt] is dense 
in K[[t]], and the residue field of K ((t)) is K, since this is the residue field of K (t) 
with the absolute value | |;. 

A non-archimedean absolute value || on a field F' will be said to be discrete if 
there exists some 6 € (0, 1) such that a € F and |a| 4 1 implies either |a| < 1 — 6 or 
|ja| > 1+ 6. (This situation cannot arise for archimedean absolute values.) 

A non-archimedean absolute value need not be discrete, but the examples of non- 
archimedean absolute values which we have given are all discrete. 


Moreover R is dense in R since, if 0 < ¢ < 1, for any a € R there exists a € F 


Lemma 12 Let F be a field with a nontrivial non-archimedean absolute value ||, 
and let R and M be the corresponding valuation ring and valuation ideal. Then the 
absolute value is discrete if and only if M is a principal ideal. In this case the only 
nontrivial proper ideals of R are the powers M‘(k =1,2,...). 


Proof Suppose first that the absolute value || is discrete and put w = sup,cjy lal. 
Then 0 < uw < 1 and the supremum is attained, since |a,| — yu implies |an414,, > 1, 
Thus x = |z| for some z € M. For any a € M we have |az~'| < 1 and hence 
a = ma’, where a’ € R. Thus M is a principal ideal with generating element z. 

Suppose next that M is a principal ideal with generating element z. If |a| < 1, 
then a € M. Thus a = za’, where a’ € R, and hence |a| < |z]. Similarly if |a| > 1, 
then a~! € M. Thus |a~!| < |z| and hence |a| > |z|~!. This proves that the absolute 
value is discrete. 
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We now show that, for any nonzero a € M, there is a positive integer k such that 
|a| = |z|*. In fact we can choose k so that 


Ae! 2 lal < |a\", 


|| 

Then |z| < |ax~*| < 1, which implies |az~*| = 1 and hence |a| = |z|*. Thus 
the value group of the valued field F is the infinite cyclic group generated by |z|. The 
final statement of the lemma follows immediately. 


It is clear that if an absolute value || on a field F is discrete, then its extension to 
the completion F of F is also discrete. Moreover, if z is a generating element for the 
valuation ideal of F, then it is also a generating element for the valuation ideal of F. 

Suppose now that not only is M = (z) a principal ideal, but the residue field 
k = R/M 1s finite. Then there exists a finite set S C R, with the same cardinality as 
k, such that for each a € R there is a unique a € S for which |a — a| < 1. Since the 
elements of k are the cosets a++M, where a € S, we call S a set of representatives in R 
of the residue field. It is convenient to choose a = 0 as the representative for M itself. 

Under these hypotheses a rather explicit representation for the elements of the 
valued field can be derived: 


Proposition 13 Let F be a field with a non-archimedean absolute value ||, and let R 
and M be the corresponding valuation ring and valuation ideal. Suppose the absolute 
value is discrete, i.e. M = (z) is a principal ideal. Suppose also that the residue field 
k = R/M is finite, and let S C R be a set of representatives of k with0O € S. 

Then for each a € F there exists a unique bi-infinite sequence (An)nez, where 
an € S foralln € Zand ay, F 0 for at most finitely many n < 0, such that 


a= > dan”. 


neZ 


If N is the least integer n such that an # 0, then |a| = |x|“. In particular, a € R if 
and only if dn = 0 foralln < 0. 

If F is complete then, for any such bi-infinite sequence (Gn), the series >) ,,<7, 4n%" 
is convergent with suma € F. 


Proof Suppose a € F anda # 0. Then |a| = |z|% for some N € Z and hence 
lax~"| = 1. There is a unique ay € S such that jJaw~% — ay| < 1. Then jay| = 1, 
lax" — an| < |a| and 


ax~N = an tain, 
where a; € R. Similarly there is a unique ay+1 € S such that 
a| = Gn4+1 +427, 
where az € R. Continuing in this way we obtain, for any positive integer n, 


N N+1 N N+n+1 
a=ann +anqim™t) +-+>+angnt 1” +anyin® i", 
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where ay, @N+15--->4Nin € Sand an41 € R. Since |anz;aNt"*+!| = Oasn > ow, 
the series >”. y @na" converges with sum a. 

On the other hand, it is clear that if a = DuneN Gna", where a, € Sanday # 0, 
then the coefficients a, must be determined in the above way. 

If F is complete then, by Corollary 11, any series pee n Ont" is convergent, since 
|anz”| > Oasn > oo. 


Corollary 14 Every a € Q, can be uniquely expressed in the form 


n 
a= > On P" 


neZ, 


where ad, € {0,1,..., p— 1} anda, 4 0 for at most finitely many n < 0. Conversely, 
any such series is convergent with sum a € Qy. Furthermore a € Zp if and only if 
On =O foralln <0. 


Thus we have now arrived at Hensel’s starting-point. It is not difficult to show 
that ifa = >) ,¢7anp" € Qz, then actually a € Q if and only if the sequence of 
coefficients (a,) is eventually periodic, i.e. there exist integers h > 0 and m such that 
Onth = On foralln > m. 

From Corollary 14 we can deduce again that the ring Z of ordinary integers is 
dense in the ring Z, of p-adic integers. For, if 


a= oan" € Zp, 


n>0 


where a, € {0,1,..., p — 1}, then 


k 
at = > oar" eZ 
n=0 


and Ja — ax| < p~*. 
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The analogy between p-adic absolute values and ordinary absolute values suggests 
that methods well-known in analysis may be applied also to arithmetic problems. We 
will illustrate this by showing how Newton’s method for finding the real or complex 
roots of an equation can also be used to find p-adic roots. In fact the ultrametric 
inequality makes it possible to establish a stronger convergence criterion than in the 
classical case. The following proposition is modestly known as ‘Hensel’s lemma’. 


Proposition 15 Let F be a field with a complete non-archimedean absolute value | | 
and let R be its valuation ring. Let 


1 


f(%) = cnx” +n—-1x" +++ +0 
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be a polynomial with coefficients co, ...,Cn € R and let 
FiQe) = nenx""! + (n= Denix"? 4-55 +1 


be its formal derivative. If | f (ao)| < |fi(ao)|? for some ay € R, then the equation 
f(a) = 0 has a unique solution a € R such that |a — ao| < | f1(ao)|. 


Proof We consider first the existence of a and postpone discussion of its uniqueness. 
Put 


o :=(filao)|>0, =a *|f(ao)| < 1, 
and let Dg denote the set 
{aE R: |fi(al =o, |f(@)| < 907}. 


Thus ap € Dg, and Dg © Do if 0’ < 6. We are going to show that, if 6 € (0, 1), then 
the ‘Newton’ map 


Ta=a’ :=a-— f(a)/fila) 


maps Dg into Dy. 
We can write 


f@+y) = f@)+ Ai@)y +--+ fa)y", 


where f(x) has already been defined and f2(x),..., fn(x) are also polynomials with 
coefficients from R. We substitute 


x =a,y=b:=—f(a)/fila), 


where a € Dg. Then | fj(a)| < 1, since a € R and fj(x) € R[x] (j = 1,..., 2). 
Furthermore 


lb} =o'|f(a)| < 00 <a. 
Thus b € R. Since f(a) + fi (a)b = 0, it follows that a* = a + D satisfies 


If (a")| < max | fj (a)b!| < |b? =o 7|f@P < 0707. 
<j<n 


Similarly, since fi) (a+b) — f1(a) can be written as a polynomial in b with coefficients 
from R and with no constant term, 


Ifilat+b)— fi@| < lbl <o =|fi@l 


and hence | f| (a*)| = o. This completes the proof that TDg C Dg. 
Now put ay = T*ag, so that 


Akt — ae = —f (ax)/fi (ax). 


It follows by induction from what we have proved that 
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ee) 
I f(ax)| < % O°. 


Since ) < 1 and Jags; — ag| = o~!|f (a,)I, this shows that {ax} is a fundamental 
sequence. Hence, since F' is complete, a; — a for somea é€ R. Evidently f(a) = 0 
and | f1(a)| = o. Since, for every k > 1, 


lax — ao| < max |aj - aj-| < Ooo, 
I<jsk ° 
we also have |a — ao| < O00 <o. 


To prove uniqueness, assume f(a) = 0 for some a ¥ a such that |a — ao| < o. If 
we put b = a — a, then 


0= f@-f@=fA@b+---+ fr@e". 


From b = a — ao — (a — ao) we obtain |b] < o. Since b ¢ 0 and | f;(a)| < 1, it 
follows that, for 7 > 2, 


| fj(a)b!| < |b? < olb| = |A@bl. 
But this implies 


f(a) — f@| = |fi@bl > 9, 


which is a contradiction. 


As an application of Proposition 15 we will determine which elements of the field 
Q, of p-adic numbers are squares. Since b = a’ implies b = p*’b’, where v € Z and 
|b’|» = 1, we may restrict attention to the case |b], = 1. 


Proposition 16 Suppose b € Q) and |b|p = 1. 
If p # 2, thenb = a’ for somea € Q,p if and only if |b—a5 |p < 1 for some ao € Z. 
If p = 2, then b = a’ for some a € Q, if and only if |b — 1|2 < 27. 


Proof Suppose first that p # 2. If b = a? for somea € Qp, then |a|, = 1 and 
|a — ao|p < 1 for some ao € Z, since Z is dense in Zp. Hence |ao|p) = 1 and 


2 
|b — aglp =a — aol pla + aolp <a — aolp <1. 


Conversely, suppose |b — aalp < 1 for some ag € Z. Then [aaly = | and so 
|do|p = 1. In Proposition 15 take F = Q, and f(x) = x? — b. The hypotheses of the 
proposition are satisfied, since | f(ao)|p» < 1 and |fi(ao0)|p = |2ao0|p = 1, and hence 
b =a’ for somea € Qp. 

Suppose next that p = 2. If b = a* for some a € Qs, then Jalz2 = 1 and 
la —aol2 < 2-3 for some ao € Z, since Z is dense in Zz. Hence lao|2 = 1 and 


2 3 
|b — apl2 = la — aol2la + aol2 < |a — aol2 < 2. 
Since do 1s odd, we have ag = +1 mod 4 and ae = | mod8. Hence 


|b — 12 < max{|b — al2, lag — 12} < 27°. 
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Conversely, suppose |b — 12 < 273. In Proposition 15 take F = Qo» and 
f(x) = x? — b. The hypotheses of the proposition are satisfied, since | f(1)|2 < 277 
and | fj(1)|2 = 27! and hence b = a? for some a € Qo. 


Corollary 17 Let b be an integer not divisible by the prime p. 

If p # 2, thenb = a? for somea € Q, if and only if b is a quadratic residue 
mod p. 

If p = 2, then b = a? for some a € Q: if and only if b = 1 mod 8. 


It follows from Corollary 17 that Q, cannot be given the structure of an ordered 
field. For, if p is odd, then 1 — p = a* for somea € Q,» and hence 


a*+1+---+1=0, 


where there are p — | 1’s. Similarly, if p = 2, then 1 — 23 = a” for some a € Q and 
the same relation holds with 7 1’s. 

Suppose again that F is a field with a complete non-archimedean absolute value | |. 
Let R and M be the corresponding valuation ring and valuation ideal, and letk = R/M 
be the residue field. For any a € R we will denote by a the corresponding element 
a+ M of k, and for any polynomial 


1 


f(%) = cnx” +en—-1x" +++ +0 


with coefficients co, ..., Cn € R, we will denote by 
f(x) = yx" + Gis +-+++€0 


the polynomial whose coefficients are the corresponding elements of k. 
The hypotheses of Proposition 15 are certainly satisfied if | f(ao)| < 1 =| fi(ao)I. 
In this case Proposition 15 says that if 


F(x) = (x — do)ho(x), 
where ag € R, ho(x) € R[x] and ho(ao) ¢ M, then 
f(x) = (x -—a)h(x), 


where a — ay € M, and h(x) € R[x]. In other words, the factorization of f (x) in k[x] 
can be ‘lifted’ to a factorization of f(x) in R[x]. This form of Hensel’s lemma can 
be generalized to factorizations where neither factor is linear, and the result is again 
known as Hensel’s lemma! 


Proposition 18 Let F be a field with a complete non-archimedean absolute value | |. 
Let R and M be the valuation ring and valuation ideal of F, andk = R/M the residue 
field. 
Let f € R[x] be a polynomial with coefficients in R and suppose there exist rela- 
tively prime polynomials ¢, y € k[x], with @ monic and 6(p) > 0, such that f = dy. 
Then there exist polynomials g,h € R[x], with g monic and 0(g) = O0(¢), such 
that = ¢,h = wand f = gh. 
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Proof Putn = 0(f) and m = 0(P). Then 0(w) = A(f) — 0(¢) < n—™m. There exist 
polynomials g1, 4, € R[x], with g; monic, 0(g|) = m and 0(h1) < n — m, such that 
21 = ¢,h, = w. Since ¢, y are relatively prime, there exist polynomials 7, @ € k[x] 
such that 


xp+oy =1, 
and there exist polynomials u,v € R[x] such thatu = vy, 0 = w. Thus 
f-gihie M[x], ugit+oh;—1e M[x]. 


If f = g1h1, there is nothing more to do. Otherwise, let z be the coefficient of f—g ih 
or of ug; + vh, — | which has maximum absolute value. Then 
f-—gihy eaR[x], ugi toh; —leaR{x]. 
We are going to construct inductively polynomials g;,h; € R[x] such that 
@) % =bhAj Hy; 
(ii) gj is monic and 0(g;) = m, d(hj) <n —m; 
Gii) gj — gj-1 € I ' R[x], hy — hy-1 € x! Rix]; 
(iv) f — gjhj € a/ R[x]. 


This holds already for 7 = 1 with go = ho = 0. Assume that, for some k > 2, it 
holds for all j < k and put f — gjhj = x/€j;, where €; € R[x]. Since gj is monic, 
the Euclidean algorithm provides polynomials gx, r~, € R[x] such that 

€k-10 =qegitre, O(rK) < 0(g1) =m. 


Let wx € R[x] be a polynomial of minimal degree such that all coefficients of 
€e—1u + qgh, — wz have absolute value at most |z |. Then 


wegi +rehy — €x-1 = (ugi + oh — W)bx-1 — (Cx-1u + geht — we)gi € 7 R[x]. 
We will show that 0(w;) < n — m. Indeed otherwise 
O(wegi) > n = O(rghy — €x-1) 


and hence, since g; is monic, weg +1rxh, — x1 has the same leading coefficient as 
wx. Consequently the leading coefficient of w, is in z R. Thus the polynomial obtained 
from w, by omitting the term of highest degree satisfies the same requirements as wx, 
which is a contradiction. 

If we put 


ge =ee-1 ta 'ry, ae = he-1 +0"! we, 
then (i)—(iii) are evidently satisfied for 7 = k. Moreover 


2k—2 


f = gh = —a*" (wege—1 + rehe—i — €e—1) — 07? re we 


and 
WEgK—-1 + KK — Ex-1 
= weet +rehy — C1 + wx (gk—-1 — 81) Hr (he-1 — 1) € w R[x]. 


Hence also (iv) is satisfied for 7 = k. 
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Put 


m-1 n—m 
ety art Tale, nyey= TA 


i=0 i=0 


By (iii), the sequences (al! ) and (Bu >) are fundamental sequences for each i and 
hence, since F is complete, there exist a;, 8; € R such that 


a? > ai, BY? > pias j > oo. 


If 


m-| n—m 
sQj=x" + > aa’, AG)= > fix’, 
i=0 i=0 
then, for each j > 1, 
&-8j e x/ R[x], h—h;j e x/ R[x]. 
Since 
f—gh= f —gjhj —(g—gj)h— gih—hj), 


it follows that f — gh € z/ R[x] for each j > 1. Hence f = gh. It is obvious that g 
and h have the other required properties. 


As an application of this form of Hensel’s lemma we prove 


Proposition 19 Let F be a field with a complete non-archimedean absolute value | 
and let 


f(t) = cat” + p_1t"” | +--+ 409 € Flt]. 
If cocn # 0 and, for some m such thatO <m <n, 
Icol < leml, len] < leml, 
with at least one of the two inequalities strict, then f is reducible over F. 


Proof Suppose first that |co| < |¢m| and |cn| < |¢m|. Evidently we may choose m so 
that |cm| = maxo <j <n |c;| and |c;| < |cm| forO <i < m. By multiplying f by a! we 
may further assume that, if R is the valuation ring of F, then f(t) € R[t], cm = 1 and 
Ici] < 1 forO <i < m. Hence 


fit) = meer +4 Ea daieded); 


Since the two factors are relatively prime, it follows from Proposition 18 that f is 
reducible. 

If |cn| < |em| and |col| < |cml, then the same argument also applies to the 
polynomial t” f(t7!). 
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Proposition 19 shows that if a quadratic polynomial at? + bt + c is irreducible, 
then |b] < max{|a], |c|}, with strict inequality if |a| ~ |c|. Proposition 19 will now be 
used to extend an absolute value on a given field to a finite extension of that field. 


Proposition 20 Let F be a field with a complete non-archimedean absolute value | |. 
If the field E is a finite extension of F, then the absolute value on F can be extended 
to an absolute value on E. 


Proof We will not only show that an extension of the absolute value exists, but we 
will provide an explicit expression for it. 

Regard E as a vector space over F of finite dimension n, and with any a € E 
associate the linear transformation Lg: E — E defined by La(x) = ax. Then 
det Ly € F and we claim that an extended absolute value is given by the formula 


jal = |det La|'/". 


Evidently |a| > 0, and equality holds only if a = 0, since ax = O for some 
x # 0 implies a = 0. Furthermore |ab| = |a||b|, since Lgp = LaLp and hence 
det Lup = (det Lz) (det Ly). If a € F, then Lg = al, and hence the proposed absolute 
value coincides with the original absolute value on F’. It only remains to show that 


la — b| < max(fa|, |b|) foralla,be F. 


In fact we may suppose |a| < |b| and then, by dividing by b, we see that it is sufficient 
to show that 0 < |a| < 1 implies |1 — a| < 1. 
To simplify notation, write A = Lg and let 


f(t) = det(tl — A) = t" +ep_1t” | +--+ +9 


be the characteristic polynomial of A. Then c; € F for alli and co = (—1)” det A. Let 
g(t) be the monic polynomial in F[t] of least positive degree such that g(a) = 0. 
Then g(t) is irreducible, since the field E has no zero divisors. Evidently g(t) is 
also the minimal polynomial of A. But, for an arbitrary linear transformation of an 
n-dimensional vector space, the characteristic polynomial divides the n-th power of 
the minimal polynomial (see M. Deuring, Algebren, p.4). It follows in the present case 
that f(t) = g(t)’ for some positive integer r. 
Suppose 


g(t) = 0" + byt +---+bo 


and let a € E satisfy |a| < 1 with respect to the proposed absolute value. Then 
|co| = |det A] < 1 and hence, since bj) = co, |bo| < 1. Since g is irreducible, it 
follows from Proposition 19 that |b;| < 1 for all j. Since 


g(1) =14+bn-1+--:+bo, 


this implies |g(1)| < 1 and hence | f(1)| < 1. Since f(1) = det(/ — A), this proves 
that |1 —a| < 1. 


Finally we show that there is no other extension to FE of the given absolute value 
on F besides the one constructed in the proof of Proposition 20. 
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Proposition 21 Let F be a complete field with respect to the absolute value | | and 
let the field E be a finite extension of F. Then there is at most one extension of the 
absolute value on F to an absolute value on E, and E is necessarily complete with 
respect to the extended absolute value. 


Proof Let e1,..., én be a basis for E, regarded as a vector space over F’. Then any 
a € E can be uniquely expressed in the form 


a=ayey +---+Gnen, 


where 1,...,@, € F. By Lemma 7, for any extended absolute value there exist 
positive real numbers o, w such that 


ola| < max|a;| < ula| foreveryae E. 
L 


It follows at once that E is complete. For if a“) is a fundamental sequence, then al) is 


a fundamental sequence in F fori = 1,...,n. Since F is complete, there exist a; € F 
such that af” > aii = 1,...,n) and then a” —» a, where a = ayey +--+ + Gney. 

It will now be shown that there is at most one extension to E of the absolute value 
on F’. Since we saw in 84 that the trivial absolute value on E is the only extension of 
the trivial absolute value on F’, we may assume that the given absolute value on F is 


nontrivial. For a fixed a € E, consider the powers a, a*,.... For each k we can write 
ak = ae, a a®en. 


Since |a| < 1 if and only if |a*| > 0, it follows from the remarks at the beginning 
of the proof that |a| < 1 if and only if |a| + 0 @ = 1,...,n). This condition is 
independent of the absolute value on FE. Thus if there exist two absolute values, | |; 
and | |2, which extend the absolute value on F, then |a|,; < 1 if and only if |a|z2 < 1. 
Hence, by Proposition 3, there exists a positive real number p such that 


lal2 = lal} foreverya e E. 


In fact p = 1, since for some a € F we have |a|2 = |a|; > 1. 


6 Locally Compact Valued Fields 


We prove first a theorem of Ostrowski (1918): 


Theorem 22 A complete archimedean valued field F is (isomorphic to) either the real 
field R or the complex field C, and its absolute value is equivalent to the usual absolute 
value. 


Proof Since the valuation on it is archimedean, the field F has characteristic 0 and 
thus contains Q. Since an archimedean absolute value on Q is equivalent to the usual 
absolute value, by replacing the given absolute value on F by an equivalent one we 
may assume that it reduces to the usual absolute value on Q. Since the valuation on F 
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is complete, it now follows that F contains (a copy of) R and that the absolute value 


on F reduces to the usual absolute value on R. If F contains an element i such that 


i? = —1, then F contains (a copy of) C and, by Proposition 21, the absolute value on 


F reduces to the usual absolute value on C. 
We now show that if a € F and |a| < 1, then | — a is a square in F.. Let B be the 
set of all x € F such that |x| < |a| and, for any x € B, put 


Tx = (x7 +a)/2. 
Then also Tx € B, since 
ITx| < (xl? + lal)/2 < (al? + lal)/2 < lal. 
Moreover, the map T is a contraction since, for all x, y € B, 
|Tx — Ty| = |x? — y°|/2 = |x — yllx + yl/2 < lallx — yl 


Since F is complete and B is a closed subset of F’,, it follows from the contraction 
principle (Proposition 1.26) that the map T has a fixed point x € B. Evidently 
X = (x? +a)/2 and 


l-a=1-2%+x* =(1-x)*. 


We show next that, if the polynomial t? + 1 does not have a root in F, then the 
valuation on F can be extended to the field E = F(i), where i = —1.Eachy € E has 
a unique representation y = a + ib, where a,b € F. We claim that |y | = J |a? + b?| 
is an extension to E of the given valuation on F. 

The only part of this claim which is not easily established is the triangle inequality. 
To prove it, we need only show that 


jJl+y|<1+]|y| forevery y € E. 


That is, we need only show that 


(1 +a)? +b7| <142,/|a2+b2|+|a?+b*| foralla,be F. 
Since, by the triangle inequality in F, 
|(1 +a)? +b7| < 1+ 2lal+ la? +5°I, 


it is enough to show that 


la| </la2+ b?| foralla,be F 


or, since we may suppose a # 0, 
1-< |1 +c?| for every c € F. 


Assume, on the contrary, that |1 + c?| < 1 for some c € F. Then, by the previous 
part of the proof, 


2 


—c? =1-—(14+c’*) =x? forsomex € F. 


Since c 0, this implies that —1 = 7 2 for some i € F, which is a contradiction. 
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Now E = F(i) contains C and the absolute value on E reduces to the usual 
absolute value on C. To prove the theorem it is enough to show that E = C. For 
then R C F C Cand F has dimension 1 or 2 as a vector space over R according as 
ig ForieF. 

Assume on the contrary that there exists ¢ e€ E\C. Consider the function 
g: C— R defined by 


g(z) =|z-¢| 


and put r = inf,-c y(z). Since g(0) = |¢| and g(z) > |¢| for |z| > 2|c|, and since 
g is continuous, the compact set {z € C: |z| < 2|¢|} contains a point w such that 
p(w) =r. 

Thus if we put @ = ¢ — w, then w ¥ 0 and 


0<r=|o|<|o-—z| foreveryz eC. 


We will show that |@ — z| = r for every z € C such that |z| <r. 
If ¢ = e?*'/", then 


ow -— = (@ — z)(w — €z)---(@— 6" !z) 


and hence 


Fag") s7Ng =i, 


lo 

Thus |o—z| < rj|l1—z"/w"|. Since |z| < |a@|, by letting n > oo we obtain |w—z| < r. 
But this is possible only if |@ — z| =r. 

Thus if 0 < |z| < r, then w may be replaced by w — z. It follows that |@ —nz| =r 

for every positive integer n. Hence r > n|z| — r, which yields a contradiction for 

sufficiently large n. 


If a field F is locally compact with respect to an archimedean absolute value, then 
it is certainly complete and so, by Theorem 22, it is equivalent either to R or to C with 
the usual absolute value. It will now be shown that a field F is locally compact with 
respect to a non-archimedean absolute value if and only if it is a complete field of the 
type discussed in Proposition 13. It should be observed that a non-archimedean valued 
field F is locally compact if and only if its valuation ring R is compact, since then any 
closed ball in F' is compact. 


Proposition 23 Let F be a field with a non-archimedean absolute value | |. Then F is 
locally compact with respect to the topology induced by the absolute value if and only 
if the following three conditions are satisfied: 


(i) F is complete, 
(ii) the absolute value | | is discrete, 
(iii) the residue field is finite. 


Proof As we have just observed, F is locally compact if and only if its valuation ring 
R is compact. Moreover, since R is a subset of the metric space F, it is compact if and 
only if any sequence of elements of R has a convergent subsequence. 
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The field F is certainly complete if it is locally compact, since any fundamental 
sequence is bounded. If the residue field is infinite, then there exists an infinite 
sequence (ax) of elements of R such that |ax — aj| = 1 for j # k. Since the sequence 
(ax) has no convergent subsequence, R is not compact. If the absolute value | | is not 
discrete, then there exists an infinite sequence (ax) of elements of R with 


|a;| < |a2| <--> 


and |ax| > lask > oo. Ifk > j, then jax — aj| = |ax| and again the sequence (a;) 
has no convergent subsequence. Thus the conditions (i)—(iii) are all necessary for F' to 
be locally compact. 

Suppose now that the conditions (i)—(iii) are all satisfied and let o = (ax) bea 
sequence of elements of R. In the notation of Proposition 13, let 


where a) e€ S. Since S is finite, there exists ag € S such that a” = ao for infinitely 
many ax € o. If o9 is the subsequence of o containing those ax for which ag? = ao, 


then there exists a} € S such that a = @, for infinitely many ax € oo. Similarly, if 


| 1s the subsequence of op containing those a; for which att) = a1, then there exists 


a2 € § such that af? = a for infinitely many ax € o;. And so on. If ae o;, then 


a) = Op + aim tee + aja! + Yao 


n>0 


But a = >),394na" € F, since F is complete, and la) — al < |x|/+!. Thus the 


subsequence (a‘/)) of « converges to a. 


Corollary 24 The field Q, of p-adic numbers is locally compact, and the ring Zp of 
p-adic integers is compact. 


Corollary 25 /f K is a finite field, then the field K ((t)) of all formal Laurent series is 
locally compact, and the ring K[[t]] of all formal power series is compact. 


We now show that all locally compact valued fields F with a non-archimedean 
absolute value can in fact be explicitly determined. It is convenient to treat the cases 
where F has prime characteristic and zero characteristic separately, since the argu- 
ments in the two cases are quite different. 


Lemma 26 Let F be a locally compact valued field with a nontrivial valuation. A 
normed vector space E over F is locally compact if and only if it is finite-dimensional. 


Proof Suppose first that £ is finite-dimensional over F’. If e1,..., en is a basis for the 
vector space F, then any a € E can be uniquely represented in the form 


a=ajey +--+ +Qnen, 


where a1,...,@, € F, and 
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I|allo = max |a;| 
l<i<n 
is anorm on E. Since the field F is locally compact, it is also complete. Hence, by 
Lemma 7, there exist positive real constants o, uw such that 
o|lallo < llall < “llallo foreverya e E. 


Consequently, if {ax} is a bounded sequence of elements of F then, for each j € 
{1,...,}, the corresponding coefficients {a;;} form a bounded sequence of elements 
of F’. Hence, since F is locally compact, there exists a subsequence {ax, } such that 
each of the sequences {ax, ;} converges in F, with limit £; say (j = 1,..., 7). It fol- 
lows that the subsequence {a,, } converges in E with limit b = f,e; +---+fnen. Thus 
E is locally compact. 

Suppose next that EF is infinite-dimensional over F’. Since the valuation on F is 
nontrivial, there exists a € F such that r = |a@| satisfies O < r < 1. Let V be any 
finite-dimensional subspace of E, let u’ € E\V and let 


d = inf |lu’ — oll. 
veV 


Since V is locally compact, d > 0 and d = ||u’ — v’|| for some v’ € V. Choose k € Z 
so that r&+! < d < r* and put w! = a~*(u’ — v’). For any v € V, 


jako +0’ —u' || >d 
and hence 
lwo’ — vl] > dr~* > r. 
On the other hand, 
lw! ||] =ar* <1, 


We now define a sequence {w,,} of elements of EF in the following way. Taking 


V = {O} we obtain a vector w; with r < ||w;|| < 1. Suppose we have defined 
W1,...,Wm € E so that, for 1 < j < m, ||w;|| < 1 and ||w; — v;|| > r for all 0; 
in the vector subspace V;_; of E spanned by w,..., wj-1. Then, taking V = Vin, 


we obtain a vector Wm+1 such that ||wWm+41|| < 1 and ||Wm+1 — 0m+i|| > 7 for all 
Um+1 € Vm. Thus the process can be continued indefinitely. Since ||w,, || < 1 for all m 
and ||Wm — w;|| > r for 1 < j < m, the bounded sequence {w,,} has no convergent 
subsequence. Thus E is not locally compact. 


Proposition 27 A non-archimedean valued field E with zero characteristic is locally 
compact if and only if, for some prime p, E is isomorphic to a finite extension of the 


field Qy of p-adic numbers. 


Proof If E is a finite extension of the p-adic field Q, then, since Qy is locally com- 
pact, so also is E, by Lemma 26. 

Suppose on the other hand that E is a locally compact valued field with zero char- 
acteristic. Then Q C E. By Proposition 23, the residue field k = R/M is finite and 
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thus has prime characteristic p. It follows from Proposition 4 that the restriction to Q 
of the absolute value on F is (equivalent to) the p-adic absolute value. Hence, since E 
is necessarily complete, Q, C E. If E were infinite-dimensional as a vector space over 
Q, then, by Lemma 26, it would not be locally compact. Hence E is a finite extension 


of Qp. 


We consider next locally compact valued fields of prime characteristic. 


Proposition 28 A valued field F with prime characteristic p is locally compact if and 
only if F is isomorphic to the field K ((t)) of formal Laurent series over a finite field 
K of characteristic p, with the absolute value defined in example (iv) of §1. The finite 
field K is the residue field of F. 


Proof We need only prove the necessity of the condition, since (Corollary 25) we have 
already established its sufficiency. Since F has prime characteristic, the absolute value 
on F is non-archimedean. Hence, by Proposition 23 and Lemma 12, the absolute value 
on F is discrete and the valuation ideal M is a principal ideal. Let z be a generating 
element for M. By Proposition 23 also, the residue field k = R/M is finite. Evidently 
the characteristic of k must also be p. Let g = p/ be the number of elements in k. 
Since F has characteristic p, for any a,b € F, 


(b—a)? = b? —a? 
and hence, by induction, 


(b—a)?" = bP" —a?" foralln > 1. 
The multiplicative group of k is a cyclic group of order g — 1. Choose a € R so that 
a+ M generates this cyclic group. Then |a? — a| < 1. By what we have just proved, 


n+l 
at 


—atl’ =(at—a)’, 
and hence (a%") is a fundamental sequence, by Lemma 10. Since F is complete, by 
Proposition 23, it follows that al’ + a € R. Moreover a4 = a, since 

lim (a2")? = lim af", 

noo n—- oo 
and a —a € M, since at" — at" EM for every n > 0. Hence a ¥ 0 and ai! =], 
Moreover a/ # 1 for 1 < j <q —1,since a/ = a/ modM. It follows that the set 
S consisting of 0 and the powers 1, a,...,a77! is a set of representatives in R of the 
residue field k. 

Since F has characteristic p, a generates a finite subring K of R. In fact K is a 
field, since 87 = f for every 8 € K and so BBI-* = 1 if B £0. Since S C K and the 
polynomial x7 — x has at most g roots in K, we conclude that S = K. Thus K has q 
elements and is isomorphic to the residue field k. 

Every element a of F has a unique representation 


a= > Ont", 


neZ 
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where z is a generating element for the principal ideal M, a, € S and a, 4 0 for at 
most finitely many 1 < 0. The map 


is a bijection of the field K ((¢)) onto F.. Since S is closed under addition this map pre- 
serves sums, and since S is also closed under multiplication it also preserves products. 
Finally, if N is the least integer such that ay 4 0, then |a| = |z|% and |a’| = p~% 
for some fixed p > 1. Hence the map is an isomorphism of the valued field K ((t)) 
onto F. 


7 Further Remarks 


Valued fields are discussed in more detail in the books of Cassels [1], Endler [3] and 
Ribenboim [5]. 

For still more forms of Hensel’s lemma, see Ribenboim [6]. There are also gen- 
eralizations to polynomials in several variables and to power series. The algorithmic 
implementation of Hensel’s lemma is studied in von zur Gathen [4]. Newton’s method 
for finding real or complex zeros is discussed in Stoer and Bulirsch [7], for example. 

Proposition 20 continues to hold if the word ‘complete’ is omitted from its state- 
ment. However, the formula given in the proof of Proposition 20 defines an absolute 
value on E if and only if there is a unique extension of the absolute value on F' to an 
absolute value on E; see Viswanathan [8]. 

Ostrowski’s Theorem 22 has been generalized by weakening the requirement 
|ab| = |a||b| to |ab| < |a||b|. Mazur (1938) proved that the only normed associative 
division algebras over R are R, C and H, and that the only normed associative division 
algebra over C is C itself. An elegant functional-analytic proof of the latter result was 
given by Gelfand (1941). See Chapter 8 (by Koecher and Remmert) of Ebbinghaus 
et al. [2]. 
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vil 


The Arithmetic of Quadratic Forms 


We have already determined the integers which can be represented as a sum of 
two squares. Similarly, one may ask which integers can be represented in the form 
x* + 2y? or, more generally, in the form ax* + 2bxy + cy”, where a,b, c are given 
integers. The arithmetic theory of binary quadratic forms, which had its origins in 
the work of Fermat, was extensively developed during the 18th century by Euler, 
Lagrange, Legendre and Gauss. The extension to quadratic forms in more than two 
variables, which was begun by them and is exemplified by Lagrange’s theorem that 
every positive integer is a sum of four squares, was continued during the 19th cen- 
tury by Dirichlet, Hermite, H.J.S. Smith, Minkowski and others. In the 20th century 
Hasse and Siegel made notable contributions. With Hasse’s work especially it be- 
came apparent that the theory is more perspicuous if one allows the variables to be 
rational numbers, rather than integers. This opened the way to the study of quadratic 
forms over arbitrary fields, with pioneering contributions by Witt (1937) and Pfister 
(1965-67). 

From this vast theory we focus attention on one central result, the Hasse—Minkowski 
theorem. However, we first study quadratic forms over an arbitrary field in the geo- 
metric formulation of Witt. Then, following an interesting approach due to Frohlich 
(1967), we study quadratic forms over a Hilbert field. 


1 Quadratic Spaces 


The theory of quadratic spaces is simply another name for the theory of quadratic 
forms. The advantage of the change in terminology lies in its appeal to geometric 
intuition. It has in fact led to new results even at quite an elementary level. The new 
approach had its debut in a paper by Witt (1937) on the arithmetic theory of quadratic 
forms, but it is appropriate also if one is interested in quadratic forms over the real field 
or any other field. 

For the remainder of this chapter we will restrict attention to fields for which 
1+ 1 #0. Thus the phrase ‘an arbitrary field’ will mean ‘an arbitrary field of charac- 
teristic 4 2’. The proofs of many results make essential use of this restriction on the 
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characteristic. For any field F', we will denote by F™* the multiplicative group of all 
nonzero elements of F. The squares in F* form a subgroup F*? and any coset of this 
subgroup is called a square class. 

Let V be a finite-dimensional vector space over such a field F'. We say that V is a 
quadratic space if with each ordered pair u, v of elements of V there is associated an 
element (u,v) of F such that 


Gi) (uy +u2, v) = (44, v) + (U2, v) for all uw, u2,0 € V; 
(ii) (au, v) = a(u, v) for every a € F andallu,v € V; 
(iti) (u,v) = (v, uv) forallu,v € V. 


It follows that 


(i)’ (u, 01 + 02) = (u, 01) + (U, 02) for all u, v1, v2 € V; 
(ii)’ (u, av) = a(u, v) for every a € F andallu,v € V. 


Let e),..., €, bea basis for the vector space V. Then any u, v € V can be uniquely 
expressed in the form 


n n 
u= oe v= > Njej> 
j=l j=l 


where ¢j,7; € F(j =1,...,n), and 


n 
(u,v) = > OjkSjNks 
jpk=l 
where @ jx = (€;, ek) = ax;. Thus 


n 


(u,u) = = a jkKE | CK 


Le=l 


is a quadratic form with coefficients in F’. The quadratic space is completely deter- 
mined by the quadratic form, since 


(u,v) = {u+ov,u +0) — (u,u) — (v, v)}/2. (1) 


Conversely, for a given basis ej,...,@, of V, any n x n symmetric matrix 
A = (ajx) with elements from F, or the associated quadratic form f(x) = x’ Ax, 
may be used in this way to give V the structure of a quadratic space. 

Let e},...,e/, be any other basis for V. Then 


n 
_ / 
= Tji€j> 
j=l 


where T = (z;;) is an invertible n x n matrix with elements from F’. Conversely, any 
such matrix T defines in this way a new basis e ,-..,&,. Since 
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n 


(€;, ek) = > TjiPjnthk> 


Jj h=1 
where £ jn = (e',, e;,), the matrix B = (fj) is symmetric and 
A=T'BT. (2) 


Two symmetric matrices A, B with elements from F are said to be congruent if (2) 
holds for some invertible matrix T with elements from F’. Thus congruence of sym- 
metric matrices corresponds to a change of basis in the quadratic space. Evidently 
congruence is an equivalence relation, i.e. it is reflexive, symmetric and transitive. Two 
quadratic forms are said to be equivalent over F if their coefficient matrices are con- 
gruent. Equivalence over F of the quadratic forms f and g will be denoted by f ~F g 
or simply f ~ g. 
It follows from (2) that 


det A = (det T)? det B. 


Thus, although det A is not uniquely determined by the quadratic space, if it is nonzero, 
its square class is uniquely determined. By abuse of language, we will call any repre- 
sentative of this square class the determinant of the quadratic space V and denote it by 
det V. 

Although quadratic spaces are better adapted for proving theorems, quadratic 
forms and symmetric matrices are useful for computational purposes. Thus a famil- 
iarity with both languages is desirable. However, we do not feel obliged to give two 
versions of each definition or result, and a version in one language may later be used 
in the other without explicit comment. 

A vector v is said to be orthogonal to a vector u if (u,v) = 0. Then also u is 
orthogonal to v. The orthogonal complement U+ of a subspace U of V is defined to 
be the set of all v € V such that (u,v) = 0 for every u € U. Evidently U+ is again a 
subspace. A subspace U will be said to be non-singular if U 1 U+ = {0}. 

The whole space V is itself non-singular if and only if V4 = {0}. Thus V is 
non-singular if and only if some, and hence every, symmetric matrix describing it is 
non-singular, i.e. if and only if det V 4 0. 

We say that a quadratic space V is the orthogonal sum of two subspaces V; and 
V2, and we write V = Vj LV2, if V = Vj + V2, Vi NM Vo = {0} and (v1, v2) = 0 for all 
vy € Vj, v2 € Vo. 

If A; is a coefficient matrix for V; and A2 a coefficient matrix for V2, then 


_ (A, 0 
a=(0 a) 
is a coefficient matrix for V = V;-LV2. Thus det V = (det V;)(det V2). Evidently V is 
non-singular if and only if both V; and V2 are non-singular. 
If W is any subspace supplementary to the orthogonal complement V+ of the 


whole space V, then V = V+_LW and W is non-singular. Many problems for arbitrary 
quadratic spaces may be reduced in this way to non-singular quadratic spaces. 
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Proposition 1 [fa quadratic space V contains a vector u such that (u, u) 4 0, then 
V=ULU", 

where U = (u) is the one-dimensional subspace spanned by u. 


Proof For any vector v € V, put vo’ = v — au, where a = (v0, u)/(u, u). Then (v’, u) = 
0 and hence v’ € U+. Since UN U+ = {0}, the result follows. 


A vector space basis u1,..., Uy of a quadratic space V is said to be an orthogonal 
basis if (uj, ux) = 0 whenever j # k. 


Proposition 2 Any quadratic space V has an orthogonal basis. 


Proof If V has dimension 1, there is nothing to prove. Suppose V has dimension 
n > | and the result holds for quadratic spaces of lower dimension. If (v, v) = 0 for 
all v € V, then any basis is an orthogonal basis, by (1). Hence we may assume that 
V contains a vector uw; such that (u;,u,) 4 0. If U; is the 1-dimensional subspace 
spanned by uw, then, by Proposition 1, 


V=10;. 


By the induction hypothesis ore has an orthogonal basis u2,..., Un, and uj, U2,...,Un 
is then an orthogonal basis for V. 


Proposition 2 says that any symmetic matrix A is congruent to a diagonal matrix, 
or that the corresponding quadratic form f is equivalent over F' to a diagonal form 
je? feet Ones. Evidently det f = 6, ---6, and f is non-singular if and only if 
0j 40 < j <n). If A £0 then, by Propositions | and 2, we can take 0; to be any 
element of F* which is represented by f. 

Here y € F™ is said to be represented by a quadratic space V over the field F if 
there exists a vector v € V such that (v,v) = y. 

As an application of Proposition 2 we prove 


Proposition 3 [f U is a non-singular subspace of the quadratic space V, then 
Vb, 


Proof Letuj,..., Um be an orthogonal basis for U. Then (uj,uj) AOU < j < m), 
since U is non-singular. For any vector v € V, let wu = aju, +--- + Anum, where 
aj = (v,u;)/(uj,uj;) for each j. Then u € U and (u,uj) = (v,uj)  < j < m). 
Hence v — u € Ut. Since U N Ut = {0}, the result follows. 


It may be noted that if U is a non-singular subspace and V = UW for some 
subspace W, then necessarily W = Ut. For it is obvious that W C U+ and 
dim W = dim V — dimU = dimU, by Proposition 3. 


Proposition 4 Let V be a non-singular quadratic space. If v1, ..., Um are linearly 
independent vectors in V then, for any n\,...,m © F, there exists a vector v € V 
such that (vj,v) = njQ < j < m). 

Moreover, if U is any subspace of V, then 
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(i) dim U + dim U+ = dimV; 
GSS 7c; 
(iii) U~ is non-singular if and only if U is non-singular. 


Proof There exist vectors 0m+41,...,0n € V such that 01,..., 0, form a basis for V. 
If we put ajx = (vj, vx) then, since V is non-singular, the n x n symmetric matrix 
A = (ajx) is non-singular. Hence, for any 7|,...,% € F, there exist unique 


€},...,&) € F such that v = v1 +---+ 0p satisfies 
(v1, 0) _ Misses (On, 0) = Mn- 


This proves the first part of the proposition. 

By taking U = (01,...,0m) andy, =--- = ym = 0, we see that dim Ut =n—m. 
Replacing U by U+, we obtain dim Ut+ = dim U. Since it is obvious that U C U+4, 
this implies U = Ut+. Since U non-singular means U U+ = {0}, (iii) follows at 
once from (ii). 


We now introduce some further definitions. A vector u is said to be isotropic if 
u # 0 and (u,u) = 0. A subspace U of V is said to be isotropic if it contains an 
isotropic vector and anisotropic otherwise. A subspace U of V is said to be totally 
isotropic if every nonzero vector in U is isotropic, i.e. if U C U+. According to these 
definitions, the trivial subspace {0} is both anisotropic and totally isotropic. 

A quadratic space V over a field F is said to be universal if it represents every 
y € F*,i.e.if foreach y € F* there is a vector v € V such that (v,v) = y. 


Proposition 5 [fa non-singular quadratic space V is isotropic, then it is universal. 


Proof Since V is isotropic, it contains a vector u # O such that (u, uw) = 0. Since 
V is non-singular, it contains a vector w such that (u,w) ~ 0. Then w is linearly 
independent of u and by replacing w by a scalar multiple we may assume (u, w) = 1. 
Ifv = au +, then (v,v) = y fora = {y — (w, w)}/2. 


On the other hand, a non-singular universal quadratic space need not be isotropic. 
As an example, take F to be the finite field with three elements and V the 
2-dimensional quadratic space corresponding to the quadratic form e + oe 


Proposition 6 A non-singular quadratic form f (€|,...,¢n) with coefficients from a 
field F represents y € F™ if and only if the quadratic form 


8 (C0, F15---sn) = 79 + FG --- Sn) 
is isotropic. 


Proof Obviously if f(%1,...,%n) = y and x9 = 1, then g(xo, x1,...,%n) = O. 
Suppose on the other hand that g(x0, x1,...,%n) = 0 for some x; € F, not all zero. 
If xo € 0, then f certainly represents y. If x9 = O, then f is isotropic and hence, by 
Proposition 5, it still represents y . 


Proposition 7 Let V be a non-singular isotropic quadratic space. If V = U LW, then 
there exists y € F™* such that, for some u € U andw € W, 


(u,u)=y, (w,w) =—y. 
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Proof Since V is non-singular, so also are U and W, and since V contains an isotropic 
vector v’, there exist u’ € U, w’ € W, not both zero, such that 


(u’,u’) = —(w’, w’). 


If this common value is nonzero, we are finished. Otherwise either U or W is 
isotropic. Without loss of generality, suppose U is isotropic. Since W is non-singular, 
it contains a vector w such that (w,w) 4 O, and U contains a vector wu such that 
(u, u) = —(w, w), by Proposition 5. 


We now show that the totally isotropic subspaces of a quadratic space are impor- 
tant for an understanding of its structure, even though they are themselves trivial as 
quadratic spaces. 


Proposition 8 All maximal totally isotropic subspaces of a quadratic space have the 
same dimension. 


Proof Let U; be a maximal totally isotropic subspace of the quadratic space V. Then 
UC Us and U apes contains no isotropic vector. Since Vee Ue it follows that 
V+ C U,. If V’ is a subspace of V supplementary to V+, then V’ is non-singular 
and U; = V+ + Ut, where Uj C V’. Since Uj is a maximal totally isotropic subspace 
of V’, this shows that it is sufficient to establish the result when V itself is non-singular. 

Let U2 be another maximal totally isotropic subspace of V. Put W = U, 9 U2 and 
let W,, W2 be subspaces supplementary to W in Uj, U2 respectively. We are going to 
show that W2 wi = {0}. 

Letv € Won wi. Since W2 C Ud, v is isotropic and v € ies C W-. Hence 
ve Ux and actually v € Uj, since v is isotropic. Since W2 C U?2 this implies v € W, 
and since WM W2 = {0} this implies v = 0. 

It follows that dim W2 + dim We < dim V. But, since V is now assumed non- 
singular, dim W; = dim V — dim wi, by Proposition 4. Hence dim W2 < dim W; 
and, for the same reason, dim W, < dim W>. Thus dim W2 = dim Wj, and hence 
dim U2 = dim U,. 


We define the index, ind V, of a quadratic space V to be the dimension of any 
maximal totally isotropic subspace. Thus V is anisotropic if and only if ind V = 0. 

A field F is said to be ordered if it contains a subset P of positive elements, which 
is closed under addition and multiplication, such that F is the disjoint union of the sets 
{0}, P and —P = {—x : x € P}. The rational field Q and the real field R are ordered 
fields, with the usual interpretation of ‘positive’. For quadratic spaces over an ordered 
field there are other useful notions of index. 

A subspace U of a quadratic space V over an ordered field F is said to be 
positive definite if (u, u) > O for all nonzero u € U and negative definite if (u, u) < 0 
for all nonzero u € U. Evidently positive definite and negative definite subspaces are 
anisotropic. 


Proposition 9 All maximal positive definite subspaces of a quadratic space V over an 
ordered field F have the same dimension. 
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Proof Let Us be a maximal positive definite subspace of the quadratic space V. Since 
U4 is certainly non-singular, we have V = U;1W, where W = Un, and since Ux is 
maximal, (w, w) < 0 for all w € W. Since Ui C V, we have Vt C W. If U_isa 
maximal negative definite subspace of W, then in the same way W = U_1Up, where 
Up = UL NW. Evidently Up is totally isotropic and Up C V+. In fact Up = V+, 
since U_M V+ = {0}. Since (v, v) > 0 for all v € U, LV", it follows that U_ isa 
maximal negative definite subspace of V. 

If U‘, is another maximal positive definite subspace of V, then UM W = {0} and 
hence 


dim U. + dim W = dim(U!, + W) < dimV. 


Thus dim U‘, < dim V — dim W = dim Ux. But U, and U‘, can be interchanged. 


If V is a quadratic space over an ordered field F’, we define the positive index 
ind* V to be the dimension of any maximal positive definite subspace. Similarly all 
maximal negative definite subspaces have the same dimension, which we will call the 
negative index of V and denote by ind~ V. The proof of Proposition 9 shows that 


ind+V + ind~V + dim V+ =dimV. 


Proposition 10 Let F denote the real field R or, more generally, an ordered field in 
which every positive element is a square. Then any non-singular quadratic form f in 
n variables with coefficients from F is equivalent over F to a quadratic form 


2G Pee aes 
where p € {0,1,...,n} is uniquely determined by f. In fact, 
indt f = p,ind” f =n — p,indf = min(p,n — p). 


Proof By Proposition 2, f is equivalent over F to a diagonal form 6; nt +-++-+06n ie 
where 0; #0 (1 < j <n). We may choose the notation so that 6; > 0 for j < p and 
6; < Ofor j > p. The change of variables ¢; = 5° nj G < p).¢j = (—6;)!/2n; 
(j > p) now brings f to the form g. Since the corresponding quadratic space has a 
p-dimensional maximal positive definite subspace, p = ind* f is uniquely deter- 
mined. Similarly n — p = ind f, and the formula for ind f follows readily. 


It follows that, for quadratic spaces over a field of the type considered in Proposi- 
tion 10, a subspace is anisotropic if and only if it is either positive definite or negative 
definite. 

Proposition 10 completely solves the problem of equivalence for real quadratic 
forms. (The uniqueness of p is known as Sylvester’s law of inertia.) It will now be 
shown that the problem of equivalence for quadratic forms over a finite field can also 
be completely solved. 


Lemma 11 /f V is a non-singular 2-dimensional quadratic space over a finite field 
Fg, of (odd) cardinality q, then V is universal. 
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Proof By choosing an orthogonal basis for V we are reduced to showing that if a, £, 
ye ee then there exist ¢, 7 € Fg such that aé? + Br? = y. As & runs through as 
aé? takes (q + 1)/2 = 1+ (q —1)/2 distinct values. Similarly, as 7 runs through os 
y — Bn? takes (q + 1)/2 distinct values. Since (q + 1)/2+ (q + 1)/2 > q, there exist 
¢,4 € Fg for which aé* and y — Bn? take the same value. 


Proposition 12 Any non-singular quadratic form f inn variables over a finite field Fg 
is equivalent over Fg to the quadratic form 


CP Pode bee, 


where 6 = det f is the determinant of f. 

There are exactly two equivalence classes of non-singular quadratic forms in n 
variables over Fg, one consisting of those forms f whose determinant det f is a square 
in ey , and the other those for which det f is not a square in ee 


Proof Since the first statement of the proposition is trivial form = 1, we assume that 
n > | and it holds for all smaller values of n. It follows from Lemma 11 that f repre- 
sents | and hence, by the remark after the proof of Proposition 2, f is equivalent over 
Fg to a quadratic form ¢ + g(é,...,&). Since f and g have the same determinant, 
the first statement of the proposition now follows from the induction hypothesis. 
Since F : contains (g —1)/2 distinct squares, every element of es is either a square 
or a Square times a fixed non-square. The second statement of the proposition now fol- 
lows from the first. 


We now return to quadratic spaces over an arbitrary field. A 2-dimensional quadratic 
space is said to be a hyperbolic plane if it is non-singular and isotropic. 


Proposition 13 For a 2-dimensional quadratic space V, the following statements are 
equivalent: 


(i) V is a hyperbolic plane; 

(ii) V has a basis uj, uz such that (uy, u;) = (u2, u2) = O,7 (uy, u2) = 1; 
(iti) V has a basis v1, v2 such that (v1, 01) = 1, (v2, 02) = —1, (v1, v2) = 0; 
(iv) — det V is a square in F™. 


Proof Suppose first that V is a hyperbolic plane and let uw; be any isotropic 
vector in V. If v is any linearly independent vector, then (uj,v) ~ 0, since V is 
non-singular. By replacing v by a scalar multiple we may assume that (u,v) = 1. If 
we put uz = v + au}, where a = —(v, v)/2, then 


(u2,u2) = (v,v) +2a =0, (uj, u2) = (u,v) = 1, 


and uw 1, v2 is a basis for V. 
If w1, u2 are isotropic vectors in V such that (u;,u2) = 1, then the vectors vj = 
uy +u2/2 and v2 = uy — u2/2 satisfy (iii), and if 01, v2 satisfy (iii) then det V = —1. 
Finally, if (iv) holds then V is certainly non-singular. Let w;, w2 be an orthogonal 
basis for V and put 6; = (w;j,wj) (j = 1,2). By hypothesis, 6)d2 = —y7, where 
y € F*. Since y w1 + 61 wz is an isotropic vector, this proves that (iv) implies (i). 
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Proposition 14 Let V be a non-singular quadratic space. If U is a totally isotropic 
subspace with basis u,,...,Um, then there exists a totally isotropic subspace U' with 
basis u},..., ui, such that 


(uj, u,) = 1 or O according as j =k or j #k. 
Hence UN U' = {0} and 
U+U=M1--- LA, 
where H; is the hyperbolic plane with basis uj, ui (<j <~m). 


Proof Suppose first that m = 1. Since V is non-singular, there exists a vector v € V 
such that (uj,v) 4~ 0. The subspace H; spanned by wu, v is a hyperbolic plane and 
hence, by Proposition 13, it contains a vector uw’, such that (u',, u',) = 0, (w1, u) = 1. 
This proves the proposition form = 1. 

Suppose now that m > 1 and the result holds for all smaller values of m. Let W 
be the totally isotropic subspace with basis u2,..., Um. By Proposition 4, there exists 
a vector v € W* such that (uj,v) 4 0. The subspace H spanned by uj, 0 is a 
hyperbolic plane and hence it contains a vector uw’, such that (wu, v,) = 0, (ui, u) = 1. 
Since Hj is non-singular, ie is also non-singular and V = Mmlat, Since W C H;-, 
the result now follows by applying the induction hypothesis to the subspace W of the 
quadratic space H fie 


Proposition 15 Any quadratic space V can be represented as an orthogonal sum 
V=V+tLH|L---LHnLVo, 
where H\,..., Hm are hyperbolic planes and the subspace Vo is anisotropic. 


Proof Let V; be any subspace supplementary to V+. Then V is non-singular, by the 
definition of V+. If V; is anisotropic, we can take m = 0 and Vo = V). Otherwise Vi 
contains an isotropic vector and hence also a hyperbolic plane 41, by Proposition 14. 
By Proposition 3, 


V, = H, LV, 


where V2 = H ‘i MV, is non-singular. If V2 is anisotropic, we can take Vo = V2. Other- 
wise we repeat the process. After finitely many steps we must obtain a representation 
of the required form, possibly with Vo = {0}. 


Let V and V’ be quadratic spaces over the same field F. The quadratic spaces 
V, V’ are said to be isometric if there exists a linear map g : V > V’ which is an 
isometry, i.e. it is bijective and 


(gv, gv) =(v,v) forallv eV. 
By (1), this implies 


(gu,gv)=(u,v) forallu,ov € V. 
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The concept of isometry is only another way of looking at equivalence. For if 


g : V > V’ is an isometry, then V and V’ have the same dimension. If uj, ..., uy 
is a basis for V and u',..., uj, a basis for V’ then, since (uj, ux) = (guj, pu), the 
isometry is completely determined by the change of basis in V’ from gu,..., Quy to 
/ / 
Uu Uu'. 
jrreerly 


A particularly simple type of isometry is defined in the following way. Let V be a 
quadratic space and w a vector such that (w, w) #4 0. The map t : V > V defined by 


tv =v — {2(v, w)/(w, w)}w 


is obviously linear. If W is the non-singular one-dimensional subspace spanned by w, 
then V = WLW1. Since rv = 0 if v € W+ and tv = —v if v € W, it follows that r 
is bijective. Writing a = —2(v, w)/(w, w), we have 


(tv, Tv) = (v, v) + 2a(v, w) + a*(w, w) = (0, v). 


Thus t is an isometry. Geometrically, t is a reflection in the hyperplane orthogonal 
to w. We will refer to t = Ty as the reflection corresponding to the non-isotropic 
vector w. 


Proposition 16 Jf u,u’ are vectors of a quadratic space V such that (u,u) = 
(u', u’) # 0, then there exists an isometry 9: V > V such that gu = wu’. 


Proof Since 
(utujutu)+(u—u',u—u') = 2(u,u) +2’, u’) = 4(u, uv), 


at least one of the vectors u + u’,u — u’ is not isotropic. If u — uw’ is not isotropic, 


the reflection t corresponding to w = u — wu’ has the property tu = w’, since 
(u—u',u—u') = 2(u,u—u’). If u+u’ is not isotropic, the reflection t corresponding 
to w =u +u' has the property tu = —u’. Since uv’ is not isotropic, the corresponding 


reflection o maps u’ onto —u’, and hence the isometry ot maps u onto u’. 


The proof of Proposition 16 has the following interesting consequence: 


Proposition 17 Any isometry 9 : V — V of a non-singular quadratic space V is a 
product of reflections. 


Proof Let uj,...,U, be an orthogonal basis for V. By Proposition 16 and its proof, 
there exists an isometry y, which is either a reflection or a product of two reflections, 
such that yu; = guy. If U is the subspace with basis uv; and W the subspace with 
basis u2,...,Un, then V = ULW and W = U+ is non-singular. Since the isometry 
Q = wo fixes u;, we have also g9jW = W. Butifo : W > W isa reflection, 
the extension t : V > V defined by tu = uifu € U, tw = ow if w € W, is also 
a reflection. By using induction on the dimension n, it follows that g; is a product of 
reflections, and hence so also is 9 = y@. 


By a more elaborate argument E. Cartan (1938) showed that any isometry of an 
n-dimensional non-singular quadratic space is a product of at most n reflections. 
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Proposition 18 Let V be a quadratic space with two orthogonal sum representations 
Veulweav' Ly’. 


If there exists an isometry 9 : U — U', then there exists an isometry y : V > V such 
that yu = ou for allu € U and wW = W’. Thus if U is isometric to U’, then W is 
isometric to W’. 


Proof Let u1,...,Um and Um4i,...,Un be bases for U and W respectively. If 
ui, =ouj(1 <j <m),thenu},...,u,, isa basis for U’. Let uv), 4),-..,U;, be a basis 
for W’. The symmetric matrices associated with the bases uw, ..., un and ys LU 


> n 
of V have the form 
A O A O 
O B)’\O Cy)’ 


which we will write as A @ B, A @ C. Thus the two matrices A © B, A ® C are 
congruent. It is enough to show that this implies that B and C are congruent. For 


suppose C = S‘ BS for some invertible matrix S = (0;;). If we define wu’, ,,,..., uj, by 


n 
/ A * 
u; = > o jiu; (m+1<i<n), 
j=m+1 


then (u", uy) = (uj, uk) (m+1 < j,k <n) and the linear map y : V — V defined by 
yuj=ujlsjsm), wuj=uj(m+1 <j <n), 


is the required isometry. 

By taking the bases for U, W, W’ to be orthogonal bases we are reduced to the 
case in which A, B,C are diagonal matrices. We may choose the notation so that 
A = diag[a1,..., 4m], where a; AO for j < r andaj =O for j > r. Ifa, 4 0, ie. 
ifr > 0, and if we write A’ = diag[az2,..., dm], then it follows from Propositions 1 
and 16 that the matrices A’ @ B and A’ © C are congruent. Proceeding in this way, we 
are reduced to the case A = O. 

Thus we now suppose A = O. We may assume B # O, C ¥ QO, since other- 
wise the result is obvious. We may choose the notation also so that B = O, @ B’ and 
C = O, ® C’, where B’ is non-singular and 0 < s <n—m. If T'(Omss ® C/T = 
On+s ® B’, where 


then TOT, = B’. Since B’ is non-singular, so also is Ty, and thus B’ and C’ are 
congruent. It follows that B and C are also congruent. 


Corollary 19 /f a non-singular subspace U of a quadratic space V is isometric to 
another subspace U', then U+ is isometric to U'*. 
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Proof This follows at once from Proposition 18, since U’ is also non-singular and 


V=UluUta=uU' Lu". 


The first statement of Proposition 18 is known as Witt’s extension theorem and 
the second statement as Witt’s cancellation theorem. It was Corollary 19 which was 
actually proved by Witt (1937). 

There is also another version of the extension theorem, stating that if g : U > U’ 
is an isometry between two subspaces U, U’ of a non-singular quadratic space V, 
then there exists an isometry y : V — V such that yu = gu for all u € U. For 
non-singular U this has just been proved, and the singular case can be reduced to the 
non-singular by applying (several times, if necessary) the following lemma. 


Lemma 20 Let V be a non-singular quadratic space. If U, U' are singular subspaces 
of V and if there exists an isometry 9 : U — U’, then there exist subspaces U, U ‘ 


properly containing U,U' respectively and an isometry ¢@ : U — U' such that 
gu = gu forallu € U. 


Proof By hypothesis there exists a nonzero vector u; € UN U+. Then U has a basis 
Uj,...,Um With uy as first vector. By Proposition 4, there exists a vector w € V such 
that 


uj,w)=1, (uj,w)=O0 forl <j <m. 
( ) j J. 


Moreover we may assume that (w,w) = 0, by replacing w by w — au, with 
a = (w, w)/2. If W is the 1-dimensional subspace spanned by w, then UM W = {0} 
and U = U + W contains U properly. 

The same construction can be applied to U’, with the basis gu,,...,@Um, to 
obtain an isotropic vector w’ and a subspace U' = U’ + W’. The linear map 
go: U > U’ defined by 


guj =guj(l<j<m), gw=w’, 


is easily seen to have the required properties. 


As an application of Proposition 18, we will consider the uniqueness of the repre- 
sentation obtained in Proposition 15. 


Proposition 21 Suppose the quadratic space V can be represented as an orthogonal 
sum 


V=ULHLVW, 


where U is totally isotropic, H is the orthogonal sum of m hyperbolic planes, and the 
subspace Vo is anisotropic. 

Then U = Vt, m = indV — dimV+, and Vo is uniquely determined up to an 
isometry. 


Proof Since H and Vo are non-singular, so also is W = H LVo. Hence, by the remark 
after the proof of Proposition 3, U = W-. Since U CU it follows that U Ee VtuIn 
fact U = V+, since WN V+ = {0}. 
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The subspace H has two m-dimensional totally isotropic subspaces U;, Uj such 
that 
H=U,+Ui;, NU; = {0}. 
Evidently V; := V+ + U; is a totally isotropic subspace of V. In fact V; is maximal, 


since any isotropic vector in Uj 1 Vo is contained in U;. Thus m = ind V — dim V+ is 
uniquely determined and H is uniquely determined up to an isometry. If also 


VeV"Ly Ly, 


where H’ is the orthogonal sum of m hyperbolic planes and Vj is anisotropic then, 
by Proposition 18, Vo is isometric to V}. 


Proposition 21 reduces the problem of equivalence for quadratic forms over an ar- 
bitrary field to the case of anisotropic forms. As we will see, this can still be a difficult 
problem, even for the field of rational numbers. 

Two quadratic spaces V,V’ over the same field F may be said to be 
Witt-equivalent, in symbols V ~ V’, if their anisotropic components Vo, Vj are iso- 
metric. This is certainly an equivalence relation. The cancellation law makes it pos- 
sible to define various algebraic operations on the set W(F) of all quadratic spaces 
over the field F’, with equality replaced by Witt-equivalence. If we define — V to be the 
quadratic space with the same underlying vector space as V but with (v1, v2) replaced 
by —(v1, 02), then 

VL(-V) © {O}. 

If we define the sum of two quadratic spaces V and W to be V LW, then 

VeVv, WW >VLWSV' LW’. 
Similarly, if we define the product of V and W to be the tensor product V @ W of the 
underlying vector spaces with the quadratic space structure defined by 

({o1, wi}, {02, w2}) = (v1, v2)(w1, w2), 
then 

V2eV',WeWw>VEWRV' OW. 


It is readily seen that in this way YW (F) acquires the structure of a commutative ring, 
the Witt ring of the field F. 
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Again let F be any field of characteristic 4 2 and F* the multiplicative group of all 
nonzero elements of F’. We define the Hilbert symbol (a, b)r, where a, b € F*, by 


(a, b)r = 1 if there exist x, y € F such that ax? + by* =1, 


= —1 otherwise. 


By Proposition 6, (a, b)¢ = 1 if and only if the ternary quadratic form aé* + br? — e 
is isotropic. 
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The following lemma shows that the Hilbert symbol can also be defined in an 
asymmetric way: 


Lemma 22 For any field F and any a,b € F*, (a,b) = 1 if and only if the binary 
quadratic form fa = €* — an? represents b. Moreover, for any a € F™, the set Ga of 
all b € F* which are represented by fa is a subgroup of F*. 


Proof Suppose first that ax? + by? = 1 for some x,y € F. If a is a square, the 
quadratic form f, is isotropic and hence universal. If a is not a square, then y 4 0 and 
(ylP -aGy")? =b. 

Suppose next that uv — av* = b for some u,v € F. If —ba™! is a square, the 
quadratic form aé* + by? is isotropic and hence universal. If —ba7! is not a square, 
then u # 0 and a(vu7!)? + b(u7!)? = 1. 

It is obvious that if b € Gz, then also be Ga, and it is easily verified that if 


o=amt+agQm, @2=d1n.+em, 
then 
cP — ach = (@ — a&)(n}} - an). 


(In fact this is just Brahmagupta’s identity, already encountered in §4 of Chapter IV.) 
It follows that Gz is a subgroup of F*. 


Proposition 23 For any field F, the Hilbert symbol has the following properties: 


(i) (a, b)r = (b,a)r, 
(ii) (a, bc?) p = (a, b) Ff for any c € F*, 
(ili) (a4, lr = 1, 
(iv) (a, —ab)F = (a, b)r, 
(v) if (a,b) = 1, then (a, bc)r = (a,c) for anyc € F*. 


Proof The first three properties follow immediately from the definition. The fourth 
property follows from Lemma 22. For, since Gg is a group and f, represents —a, fy 
represents —ab if and only if it represents b. The proof of (v) is similar: if f, represents 
b, then it represents bc if and only if it represents c. 


The Hilbert symbol will now be evaluated for the real field R = Qoo and the p-adic 
fields Q, studied in Chapter VI. In these cases it will be denoted simply by (a, b)oo, 
resp. (a, b),. For the real field, we obtain at once from the definition of the Hilbert 
symbol 


Proposition 24 Let a,b € R*. Then (a,b)o. = —1 if and only if botha < 0 and 
b <0. 


To evaluate (a,b), we first note that we can write a = p%a', b = p’b’, where 
a, B € Zand |a’'|p = |b'|p = 1. It follows from (i), (ii) of Proposition 23 that we may 
assume a, f € {0, 1}. Furthermore, by (ii), (iv) of Proposition 23 we may assume that 
a and f# are not both 1. Thus we are reduced to the case where a is a p-adic unit and 
either b is a p-adic unit or b = pb’, where b’ is a p-adic unit. To evaluate (a, b) p under 
these assumptions we will use the conditions for a p-adic unit to be a square which 
were derived in Chapter VI. It is convenient to treat the case p = 2 separately. 
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Proposition 25 Let p be an odd prime and a, b € Qy with |a|p = |b|p = 1. Then 


(i) (4, b)p = 1, 
(ii) (a, pb)p = 1 ifand only ifa = c? for some c € Qp. 


In particular, for any integers a,b not divisible by p, (a, b)p = 1 and (a, pb)» = 
(a/p), where (a/p) is the Legendre symbol. 


Proof LetS CZ p be a set of representatives, with 0 € S, of the finite residue field 
F» = Zp/pZp. There exist non-zero ao, bo € S such that 


la—aolp < 1,|b— bolp <1. 
But Lemma 11 implies that there exist xo, yo € S such that 
|aox6 + boy6 — Mp <1. 


Since |xolp < 1, |yolp < 1, it follows that 


lax§ + by — Ip <1. 


Hence, by Proposition VI.16, axs + by6 = 27 for some z € Qp. Since z ¥ 0, this 
implies (a, b), = 1. This proves (i). 

If a = c? for somec € Q,, then (a, pb), = 1, by Proposition 23. Conversely, 
suppose there exist x, y € Q, such that ax*+pby” = 1. Then lax?|, # |pby?|p. since 
lalp = |blp = 1. It follows that |x|, = 1, |ylp < 1. Thus lax? — l|,) < 1 and 
hence ax” = 2” for some z € Q>- This proves (i1). 

The special case where a and b are integers now follows from Corollary VI.17. 


Corollary 26 If p is an odd prime and if a,b,c € Qp» are p-adic units, then the 
quadratic form aé* + bn? + cé? is isotropic. 


Proof In fact, the quadratic form —c~!aé* — c7'by* — ¢? is isotropic, since 
(—c~!a, —c~'b), = 1, by Proposition 25. 


Proposition 27 Let a,b € Q) with |a|2 = |b|2 = 1. Then 


(i) (a, b)2 = 1 ifand only if at least one of a, b,a — 4, b — 4 is a square in Qo; 
(ii) (a, 2b)2 = 1 if and only if either a or a + 2b is a square in Q>. 


In particular, for any odd integers a,b, (a,b)2 = | if and only ifa = 1 or 
b = 1 mod 4, and (a, 2b)2 = 1 if and only ifa = 1 ora + 2b = 1 mod8. 


Proof Suppose there exist x, y € Q2 such that ax* + by” = 1 and assume, for exam- 
ple, that |x|2 > |yl2. Then |x|2 > 1 and |x|2 = 2%, where a > 0. By Corollary VI.14, 


x = 2%(x9 + 4x’), y = 27004 4y’), 


where xo € {1, 3}, yo € {0, 1,2, 3} and x’, y’ © Zo. If a and b are not squares in Qo 
then, by Proposition VI.16, Ja — 1]2 > 277 and |b — 1|2 > 273. Thus 


a=at+8a', b=bo+8b’, 
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where ao, bo € (3,5, 7} and a’, b’ € Zo. Hence 
1 = ax? + by? = 27" (ap + boyg + 82’), 


where z’ € Zp. Since ag,bo are odd and Ya = 0, 1 or 4mod8, we must have a = 0, 
we = 4mod8 and ap = 5. Thus, by Proposition VI.16 again, a — 4 is a square in Q>. 
This proves that the condition in (i) is necessary. 

If a is a square in Qs, then certainly (a,b)2 = 1. If a — 4 is a square, then 
a = 5+ 8a’, where a’ € Z, anda+ 4b = 14+ 8c’, where c’ € Zo. Hence a + 4b 
is a square in Q and the quadratic form az? + by” represents 1. This proves that the 
condition in (1) 1s sufficient. 

Suppose next that there exist x, y € Q> such that ax? + 2by* = 1. By the same 
argument as for odd p in Proposition 25, we must have |x|2 = 1, |yl2 < 1. Thus 
x = x9 + 4x’, y = yo + 4y’, where xo € {1,3}, yo © {0, 1,2, 3} and x’, y’ © Zp. 
Writing a = ag + 8a’, b = bo + 8b’, where ag, bo € {1,3,5, 7} and a’, b’ € Zz, we 
obtain aox5 + 2boy4 = 1 mod 8. Since 2y4 = 0 or 2 mod 8, this implies either ap = 1 
or ag + 2bo = 1 mod 8. Hence either a or a + 2b is a square in Q3. It is obvious that, 
conversely, (a, 2b)2 = 1 if either a or a + 2b is a square in Q3. 

The special case where a and b are integers again follows from Corollary VI.17. 


For F = R, the factor group F*/F*? is of order 2, with 1 and —1 as rep- 


resentatives of the two square classes. For F = Q,, with p odd, it follows from 
Corollary VI.17 that the factor group F*/F** is of order 4. Moreover, if r is 
an integer such that (r/p) = —1, then 1,r, p,rp are representatives of the four 


square classes. Similarly for F = Qo, the factor group F*/F* is of order 8 and 
1,3,5, 7,2, 6, 10, 14 are representatives of the eight square classes. The Hilbert sym- 
bol (a, b)r for these representatives, and hence for all a, b € F*, may be determined 
directly from Propositions 24, 25 and 27. The values obtained are listed in Table 1, 
where ¢ = (—1/p) and thus ¢ = +1 according as p = +1 mod4. 

It will be observed that each of the three symmetric matrices in Table | is a 
Hadamard matrix! In particular, in each row after the first row of +’s there are equally 
many + and — signs. This property turns out to be of basic importance and prompts 
the following definition: 

A field F is a Hilbert field if some a € F™ is not a square and if, for every such a, 
the subgroup G, has index 2 in F*. 

Thus the real field R = Q. and the p-adic fields Q,, are all Hilbert fields. We now 
show that in Hilbert fields further properties of the Hilbert symbol may be derived. 


Proposition 28 A field F is a Hilbert field if and only if some a € F™ is not a square 
and the Hilbert symbol has the following additional properties: 


(i) if (a, b)r = 1 for every b € F*, then a is a square in F*; 
(ii) (a, bc) = (a, b) F(a, c) Fr for alla,b,c € F*. 


Proof Let F be a Hilbert field. Then (i) holds, since Gg # F* if a is not a square. 
If (a,b)r = 1 or (a,c)r = 1, then (ii) follows from Proposition 23(v). Suppose 
now that (a, b)r = —1 and (a,c)r = —1. Then a is not a square and f, does not 
represent b or c. Since F is a Hilbert field and b, c ¢ Gg, it follows that bc € Gg. Thus 
(a, bc) = 1. The converse is equally simple. 
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Table 1. Values of the Hilbert symbol (a, b) r for F = Qp 


co =R Qp : p odd 
a\b | -l a\b 1 p rp r 
i ai 1 + ah ab 
-l1 + - Po + ¢é€ -é€ = 
rp + -€ &€ =- 
a = Se 
where r is a primitive root mod p and 
e= (-1)0-D/2 

Q 
a\b 1 3 6 2 14 10 5 #7 
P+ tt + FF + + 
2. 42 ae Se Spe ea 
a a ee i a rd 
2 ee 
Ce a a 
10 ee 
2 a ee a so 
Ge, tee See ee 


The definition of a Hilbert field can be reformulated in terms of quadratic forms. If 
f is an anisotropic binary quadratic form with determinant d, then —d is not a square 
and f is equivalent to a diagonal form a(é* + dy’). It follows that F is a Hilbert field 
if and only if there exists an anisotropic binary quadratic form and for each such form 
there is, apart from equivalent forms, exactly one other whose determinant is in the 
same square class. We are going to show that Hilbert fields can also be characterized 
by means of quadratic forms in 4 variables. 


Lemma 29 Let F be an arbitrary field and a, b elements of F* with (a,b)r = —1. 
Then the quadratic form 


fa,b = &} — a& — b(G — af) 


is anisotropic. Morover, the set Gq,» of all elements of F* which are represented by 
fa,p is a subgroup of F*. 


Proof Since (a,b)r = —1, a is not a square and hence the binary form fy is 
anisotropic. If fz,» were isotropic, some c € F* would be represented by both f, 
and bf,. But then (a, c)r = | and (a, bc) = 1. Since (a, b) rp = —1, this contradicts 
Proposition 23. 

Clearly if c € Gg.p, then also cle Ga,p, and it is easily verified that if 


C1 = fim + aday2 + bo3n3 — abcana, 62 = Sino + 62m — beana + beans, 
3 =1n3 + m1 + adoy4 — ac4n2, 64 = C14 + Cam + d2N3 — 32, 
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then 


c? — ace — be? + abc? = @? — ad} — be? + abE?) (yi, — anf — bn? + abn?). 


It follows that Gz, is a subgroup of F*. 


Proposition 30 A field F is a Hilbert field if and only if one of the following mutually 
exclusive conditions is satisfied: 


(A) F is an ordered field and every positive element of F is a square; 
(B) there exists, up to equivalence, one and only one anisotropic quaternary quadratic 
form over F. 


Proof Suppose first that the field F is of type (A). Then —1 is not a square, since 
—1+ 1 = 0 and any nonzero square is positive. By Proposition 10, any anisotropic 
binary quadratic form is equivalent over F to exactly one of the forms €7-++ 7, —€7—1* 
and therefore F is a Hilbert field. Since the quadratic forms é + a + ee + an and 
-& _ oS _ ey - Gr are anisotropic and inequivalent, the field F' is not of type (B). 

Suppose next that the field F is of type (B). The anisotropic quaternary quadratic 
form must be universal, since it is equivalent to any nonzero scalar multiple. Hence, 
for any a € F™ there exists an anisotropic diagonal form 


=ae, =) =£G =o, 


where b’, c’,d' € F*. In particular, fora = —1, this shows that not every element 
of F* is a square. The ternary quadratic form h = —b'é} — c'€? — d'€ is certainly 
anisotropic. If h does not represent |, the quaternary quadratic form —& +h is also 
anisotropic and hence, by Witt’s cancellation theorem, a must be a square. Conse- 
quently, if a € F™ is not a square, then there exists an anisotropic form 


—ad? + & — be? — cé?. 


Thus for any a € F* which is not a square, there exists b € F™* such that 
(a, b)r = —1.If (a, b)r = (a, b')r = —1 then, by Lemma 29, the forms 


Ef — a& — WG — a&q), & — a& — b(G — ak) 


are anisotropic and thus equivalent. It follows from Witt’s cancellation theorem that 
the binary forms WG _ aj) and b’ Ge _ aé}) are equivalent. Consequently a — ge; 
represents bb’ and (a, bb’) = 1. Thus Gg has index 2 in F* for any a € F* which 
is not a square, and F is a Hilbert field. 

Suppose now that F is a Hilbert field. Then there exists a € F* which is not a 
square and, for any such a, there exists b € F* such that (a, b)p = —1. Consequently, 
by Lemma 29, the quaternary quadratic form fa,» is anisotropic and represents |. Con- 
versely, any anisotropic quaternary quadratic form which represents | is equivalent to 
some form 


g = —a&y — b(G — cj) 
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with a,b,c € F™*. Evidently a and c are not squares, and if d is represented by 
a _ eey, then bd is not represented by ay _ ae. Thus (c, d)r = 1 implies (a, bd) = 
—1. In particular, (a,b)r = —1 and hence (c,d)r = 1 implies (a,d)r = 1. 
By interchanging the roles of cr _ ace and a _ ae we see that (a,d)r = | also 
implies (c, d)~ = 1. Hence (ac, d)r = 1 for alld € F™. Thus ac is a square and g is 
equivalent to 


fab = & — a&z — b(G — a). 


We now show that fa,» and fay are equivalent if (a,b)r = (a’,b’)r = -l. 
Suppose first that (a, b’)p = —1. Then (a, bb’) = 1 and there exist x3, x4 € F such 
that b’ = b(x} — ax}). Since 


(x3 — axf)(G — a€}) = nj — any, 


where 73 = x3¢3 + ax4c4, 44 = X43 + X3¢4, it follows that f, p is equivalent to fj,y. 
For the same reason fj p is equivalent to fy» and thus fa,» is equivalent to fy». By 
symmetry, the same conclusion holds if (a’, b)- = —1. Thus we now suppose 


(a, b')F = (a’, b)r =1. 


But then (a, bb’) r = (a’, bb')- = —1 and so, by what we have already proved, 


fa,b ~ fa,po' ~ fa',po' ~ Fa',p'- 


Together, the last two paragraphs show that if F is a Hilbert field, then all 
anisotropic quaternary quadratic forms which represent | are equivalent. Hence the 
Hilbert field F is of type (B) if every anisotropic quaternary quadratic form repre- 
sents 1. 

Suppose now that some anisotropic quaternary quadratic form does not represent 1. 
Then some scalar multiple of this form represents 1, but is not universal. Thus fi,» is 
not universal for some a, b € F* with (a, b)- = —1. By Lemma 29, the set Gy, of 
all c € F* which are represented by fa, is a subgroup of F*. In fact Gap = Ga, 
since Gg C Ga,p, Ga,p # F* and Gy, has index 2 in F*. Since fa.» ~ fb,a, we have 
also Gg,p = Gp. Thus (a, c)r = (b,c) for all c € F*, and hence (ab, c)r = 1 for 
allc € F™. Thus ab is a square and (a, a)r = (a, b)r = —1. Since (a, —a)r = 1, it 
follows that (a, —1)r = —1. Hence fa,» ~ faa ~ fa,—1. Replacing a, b by —1, a we 
now obtain (—1, —1)- = —land fo.-1 ~ f-1,-1. 

Thus the form 


fH=G+E44+24+2 


is not universal and the subgroup P of all elements of F* represented by f coincides 
with the set of all elements of F* represented by €* + 7. Hence P + P C P and P 
is the set of all c € F* such that (—1,c)r = 1. Consequently —1 ¢ P and F is the 
disjoint union of the sets {O}, P and —P. Thus F is an ordered field with P as the set 
of positive elements. 

For any c € F%, c? € P. It follows that if a,b € P then (—a,—b)r = —l, 
since ae? + br? does not represent —1. Hence it follows that, if a,b e€ P, 
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then (—a,—b)r = —1 = (—1,-—))pr and (—a,b)r = 1 = (—1,b) . Thus, for 
allc € F*, (—a,c)r = (—1,c)p and hence (a, c)r = 1. Therefore a is a square and 
the Hilbert field F is of type (A). 


Proposition 31 /f F is a Hilbert field of type (B), then any quadratic form f in more 
than 4 variables is isotropic. 
For any prime p, the field Qy of p-adic numbers is a Hilbert field of type (B). 


Proof The quadratic form f is equivalent to a diagonal form a iG +::: +ané?, where 
n>AIfg= ait feet ase, is isotropic, then so also is f. If g is anisotropic then, 
since F is of type (B), it is universal and represents —a5. This proves the first part of 
the proposition. 

We already know that Q, is a Hilbert field and we have already shown, after the 
proof of Corollary VI.17, that Q, is not an ordered field. Hence Q, is a Hilbert field of 
type (B). 


Proposition 10 shows that two non-singular quadratic forms in n variables, with 
coefficients from a Hilbert field of type (A), are equivalent over F if and only if they 
have the same positive index. We consider next the equivalence of quadratic forms 
with coefficients from a Hilbert field of type (B). We will show that they are classified 
by their determinant and their Hasse invariant. 

If a non-singular quadratic form f, with coefficients from a Hilbert field F’, is 
equivalent to a diagonal form aye? teeet One then its Hasse invariant is defined to 
be the product of Hilbert symbols 


se(f)= [J] Gane. 


l<j<k<n 


We write s»(f) for sr(f) when F = Q,. (It should be noted that some authors define 
the Hasse invariant with I; <x in place of I; <x): It must first be shown that this is 
indeed an invariant of f, and for this we make use of Witt’s chain equivalence theorem: 


Lemma 32 Let V be a non-singular quadratic space over an arbitrary field F. If 
B={u,...,Un} and B = {u\,...,u',} are both orthogonal bases of V, then there 
exists a chain of orthogonal bases Zo, B\,..., Bm, with Bo = Band By = ZB, 
such that Bj—, and &; differ by at most 2 vectors for each j € {1,...,m}. 


Proof Since there is nothing to prove if dim V =n < 2, we assume that n > 3 and 
the result holds for all smaller values of n. Let p = p(#) be the number of nonzero 
coefficients in the representation of u} as a linear combination of u1,..., u,. Without 
loss of generality we may suppose 


P 
ig —_— . . 
uy = ajuj, 
j=l 


where aj # 0 (1 < j < p). If p = 1, we may replace u by wu‘ and the result now 
follows by applying the induction hypothesis to the subspace of all vectors orthogonal 
to Wy: Thus we now assume p > 2. We have 
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aj (ui, 1) +++ +5 (up, up) = (uj, ui) #0, 


and each summand on the left is nonzero. If the sum of the first two terms is zero, then 
p > 2 and either the sum of the first and third terms is nonzero or the sum of the second 
and third terms is nonzero. Hence we may suppose without loss of generality that 


2 2 
aj (ur, U1) +43 (U2, U2) #0. 
If we put 
DI =ayuj +aQu2, v2=Uui+bu2, vj =uj for3 <j <n, 


where b = —aj1(uj, u1)/a2(u2, u2), then A, = {v1,...,0,} is an orthogonal basis 
and uu}, = vj +. 4303 + --- + apvp. Thus p(A)) < p(#). By replacing A by A, and 
repeating the procedure, we must arrive after s < n steps at an orthogonal basis &, for 
which p(&;) = 1. The induction hypothesis can now be applied to A&A, in the same way 
as for FZ. 


Proposition 33 Let F be a Hilbert field. If the non-singular diagonal forms 
aye? Gate and bie feee+ bye? are equivalent over F, then 


[] Geawr= [] ©. ber. 


l<j<k<n l<j<k<n 


Proof Suppose first that n = 2. Since aye? + ane? represents by, e + ay 'ané? rep- 
resents a, ‘by and hence (—ay'a2,a7'bi)r = 1. Thus (a1),, —ajazb;) = 1 and 
hence (a,b1, a2b,) - = 1. But (Proposition 28 (ii)) the Hilbert symbol is multiplicative, 
since F is a Hilbert field. It follows that (a1, a2) F(b1, a1a2b1) Fr = 1. Since the deter- 
minants a,a2 and b;b2 are in the same square class, this implies (a1, a2) - = (1, b2)F, 
as we wished to prove. 

Suppose now that n > 2. Since the Hilbert symbol is symmetric, the product 
Thiezcren(@ys ax)F is independent of the ordering of a),..., an). It follows from 


Lemma 32 that we may restrict attention to the case where aye? + args is equiva- 
lent to b}é? + bog} and aj = bj for all j > 2. Then (a), a2)r = (b1, bz), by what 
we have already proved, and it is enough to show that 


(a,c) F (a2, c)r = (b1, c)r(b2,c)r foranyc € F*. 


But this follows from the multiplicativity of the Hilbert symbol and the fact that aja2 
and b;b2 are in the same square class. 


Proposition 33 shows that the Hasse invariant is well-defined. 


Proposition 34 Two non-singular quadratic forms in n variables, with coefficients 
from a Hilbert field F of type (B), are equivalent over F if and only if they have the 
same Hasse invariant and their determinants are in the same square class. 


Proof Only the sufficiency of the conditions needs to be proved. Since this is trivial 
for n = 1, we suppose first that n = 2. It is enough to show that if 


f=aGt+d&), g =b(ni+dn3), 
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where (a,ad)r = (b,bd)r, then f is equivalent to g. The hypothesis implies 
(—d,a)r = (—d,b)r and hence (—d,ab)r = 1. Thus ra + ee represents ab and 
f represents b. Since det f and det g are in the same square class, it follows that f is 
equivalent to g. 

Suppose next that n > 3 and the result holds for all smaller values of n. Let 
f(@i,---,¢n) and g(m1, ..., 4m) be non-singular quadratic forms with det f = det g = 
d and sr(f) = sr(g). By Proposition 31, the quadratic form 


ACC, «+25 Sas Mls +++ Mn) = FCI, +++ En) — BUM, «+5 Mn) 


is isotropic and hence, by Proposition 7, there exists some a, € F'™* which is repre- 
sented by both f and g. Thus 


fragt, granite’, 
where 
ft Sante tan€?,  g* = bok +--+ ban’. 


Evidently det f* and det g* are in the same square class and sr(f) = csr(f*), 
SF(g) = c’sr(g*), where 


C = (a1, a2°++ Gn) F = (a1, a1) F(a, d)r = (a1, b2- ++ bn) p = CC’. 


Hence sr(f*) = sr(g*). It follows from the induction hypothesis that f* ~ g*, and 
so f ~ g. 


3 The Hasse—Minkowski Theorem 


Let a, b,c be nonzero squarefree integers which are relatively prime in pairs. It was 
proved by Legendre (1785) that the equation 


ax* + by? + cz” =0 


has a nontrivial solution in integers x, y, z if and only if a, b, c are not all of the same 
sign and the congruences 


u>=—bcemoda, v*=—camodb, w* = —abmodc 
are all soluble. 

It was first completely proved by Gauss (1801) that every positive integer which is 
not of the form 4” (8k + 7) can be represented as a sum of three squares. Legendre had 
given a proof, based on the assumption that if a and m are relatively prime positive 
integers, then the arithmetic progression 


ad,a+t+m, a+2m,... 


contains infinitely many primes. Although his proof of this assumption was faulty, 
his intuition that it had a role to play in the arithmetic theory of quadratic forms 
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was inspired. The assumption was first proved by Dirichlet (1837) and will be 
referred to here as ‘Dirichlet’s theorem on primes in an arithmetic progression’. In 
the present chapter Dirichlet’s theorem will simply be assumed, but it will be proved 
(in a quantitative form) in Chapter X. 

It was shown by Meyer (1884), although the published proof was incomplete, that 
a quadratic form in five or more variables with integer coefficients is isotropic if it is 
neither positive definite nor negative definite. 

The preceding results are all special cases of the Hasse-Minkowski theorem, which 
is the subject of this section. Let Q denote the field of rational numbers. By Ostrowski’s 
theorem (Proposition VI.4), the completions Q, of Q with respect to an arbitrary ab- 
solute value ||, are the field Q.. = R of real numbers and the fields Q, of p-adic 
numbers, where p is an arbitrary prime. The Hasse—Minkowski theorem has the 
following statement: 


A non-singular quadratic form f (€1,...,€n) with coefficients from Q is isotropic 
in Q if and only if it is isotropic in every completion of Q. 


This concise statement contains, and to some extent conceals, a remarkable amount 
of information. (Its equivalence to Legendre’s theorem when n = 3 may be established 
by elementary arguments.) The theorem was first stated and proved by Hasse (1923). 
Minkowski (1890) had derived necessary and sufficient conditions for the equivalence 
over Q of two non-singular quadratic forms with rational coefficients by using known 
results on quadratic forms with integer coefficients. The role of p-adic numbers was 
taken by congruences modulo prime powers. Hasse drew attention to the simplifica- 
tions obtained by studying from the outset quadratic forms over the field Q, rather 
than the ring Z, and soon afterwards (1924) he showed that the theorem continues to 
hold if the rational field Q is replaced by an arbitrary algebraic number field (with its 
corresponding completions). 

The condition in the statement of the theorem is obviously necessary and it is only 
its sufficiency which requires proof. Before embarking on this we establish one more 
property of the Hilbert symbol for the field Q of rational numbers. 


Proposition 35 For any a,b € Q%, the number of completions Q, for which one has 
(a, b)) = —1 (where v denotes either oo or an arbitrary prime p) is finite and even. 


Proof By Proposition 23, it is sufficient to establish the result when a and b are 
square-free integers such that ab is also square-free. Then (a,b). = 1 for any 
odd prime r which does not divide ab, by Proposition 25. We wish to show that 
II, (@, 4). = 1. Since the Hilbert symbol is multiplicative, it is sufficient to estab- 
lish this in the following special cases: fora = —1 and b = —1,2, p; fora = 2 and 
b = p; fora = p and b = q, where p and q are distinct odd primes. But it follows 
from Propositions 24, 25 and 27 that 


[ [GC -De = C1, —Deo(-1, -I2 = (-1)(C-1) = 1; 


D 


[]@1. 20 = C1, 2)oo(-1, 2)2 = 1-1 = 1; 
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[[@1. po = C1, P)p(FL, p)2 = (-1/py(- PP”; 
D 


[]@. Pio = @ Pip, pio = 2/py(-Y"-*; 


Dv 


[[@. Oo = @. Op. Da(P. 92 = G/P)(p/q)(- DPV D4. 


v 


Hence the proposition holds if and only if 


(=1/p) = (°°, Q/p) = (DOP, G/p)(p/q) = (DOVE. 


Thus it is actually equivalent to the law of quadratic reciprocity and its two 
‘supplements’. 


We are now ready to prove the Hasse—Minkowski theorem: 


Theorem 36 A non-singular quadratic form f (€1,...,¢,) with rational coefficients 
is isotropic in Q if and only if it is isotropic in every completion Qy. 


Proof We may assume that the quadratic form is diagonal: 
f=agi te +angi, 


where ax € Q*(k = 1,...,2). Moreover, by replacing & by rpé, we may assume 
that each coefficient a; is a square-free integer. 

The proof will be broken into three parts, according as n = 2,n = 3 orn > 4. The 
proofs for n = 2 and n = 3 are quite independent. The more difficult proof for n > 4 
uses induction on n and Dirichlet’s theorem on primes in an arithmetic progression. 


(i) n = 2: We show first that if a € Q* is a square in Q* for all v, then a is already 
a square in Q*. Since a is a square in Q*., we have a > 0. Leta = 06 p°? be the 
factorization of a into powers of distinct primes, where a, € Zand ap ¢ 0 for at most 
finitely many primes p. Since |a|,) = p-“” and a is a square in Q,, a» must be even. 
But if ap = 2f then a = b?, where b = |], p’?. 

Suppose now that f = ag? + anés is isotropic in Q, for all v. Then a := —a,a2 
is a square in Q, for all v and hence, by what we have just proved, a is a square in Q. 
But if a = b?, then aya; + anb* = 0 and thus f is isotropic in Q. 


(ii) n = 3: By replacing f by —a3 f and ¢3 by a3¢3, we see that it is sufficient to prove 
the theorem for 


f =al? +b? — 2, 


where a and b are nonzero square-free integers. The quadratic form f is isotropic in Q, 
if and only if (a, b)) = 1. Ifa = 1 orb = 1, then ff is certainly isotropic in Q. Since 
f is not isotropic in Qoo if a = b = —1, this proves the result if |Jab| = 1. We will as- 
sume that the result does not hold for some pair a, b and derive a contradiction. Choose 
a pair a, b for which the result does not hold and for which |ab| has its minimum value. 
Then a £ 1,b £1 and |ab| > 2. Without loss of generality we may assume |a| < |b], 
and then |b| > 2. 
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We are going to show that there exists an integer c such that c?> = amodb. 
Since +b is a product of distinct primes, it is enough to show that the congruence 
x? = amod p is soluble for each prime p which divides b (by Corollary II.38). Since 
this is obvious if a = 0 or 1 mod p, we may assume that p is odd and a not divisible 
by p. Then, since f is isotropic in Q,, (a,b), = 1. Hence a is a square mod p by 
Proposition 25. 

Consequently there exist integers c, d such that a = c? — bd. Moreover, by adding 
to c a suitable multiple of b we may assume that |c| < |b|/2. Then 


|d| = |c* —al/|b| < |b|/4 +1 < [DI 
and d # 0, since a is square-free and a ¢ 1. We have 
bd(ak? + by? — ¢°) = aX’ +adY? —Z, 
where 
X=c€+¢6, Y=by, Z=aé+ctl. 


Moreover the linear transformation ¢, 7, ¢ — X,Y, Z is invertible in any field of zero 
characteristic, since c? — a 4 0. Hence, since f is isotropic in Q, for all v, so also is 
g =aé*+dr* — . Since f is not isotropic in Q, by hypothesis, neither is g. But 
this contradicts the original choice of f, since |ad| < |ab|. 

It may be noted that for n = 3 it need only be assumed that f is isotropic in Q, for 
all primes p. For the preceding proof used the fact that f is isotropic in Qoo only to 
exclude from consideration the quadratic form —é? — y? — ¢? and this quadratic form 
is anisotropic also in Q2, by Proposition 27. In fact for n = 3 it need only be assumed 
that f is isotropic in Q, for all v with at most one exception since, by Proposition 35, 
the number of exceptions must be even. 


(iii) n > 4: We have 
f=agi te + ang: 
where a1, ..., Gd, are square-free integers. We write f = g — h, where 
8 = aie} + and7,h = —a363 — + -- — ang. 


Let S be the finite set consisting of co and all primes p which divide 2a, --- dy. By 
Proposition 7, for each v € S there exists c, € Q* which is represented in Q, by both 
g and h. We will show that we can take c, to be the same nonzero integer c for every 
veS. 

Let v = p bea prime in S. By multiplying by a square in Qo we may assume that 
Cp = p’?c!,, where ép = 0 or | and |c’),|p = 1. If p is odd and if bp is an integer 
such that |cp — Bp|p < p~*?7', then |bp|p = |cp|p and |bpc;! — Ip < p~'. Hence 
; 1 
if p = 2 and if b is an integer such that |c2 — b2|2 < 2 , then |b2]2 = |c2|2 and 
|boc! -—l1|j2< 2-3. Hence bocy! is a square in Qs and we can replace cz by bp. 


bpc, is a square in Q5 , by Proposition VI.16, and we can replace cp by by. Similarly 


8-3 
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By the Chinese remainder theorem (Corollary II.38), the simultaneous congruences 
c = by mod2°*7, c= by mod p’rt! for every odd p € S, 


have a solution c € Z, that is uniquely determined mod m, where m = 4[] pes pr, 
In exactly the same way as before we can replace b, by c for all primes p € S. By 
choosing c to have the same sign as Cg, we can take cy = c for allo € S. 

Ifd =|] pes p*? is the greatest common divisor of c and m then, by Dirichlet’s 
theorem on primes in an arithmetic progression, there exists an integer k with the same 
sign as c such that 


c/d+km/d =+q, 
where q is a prime. If we put 
a=c+km=+dq, 


then gq is the only prime divisor of a which is not in S and the quadratic forms 


g = agp tag tag, hn =at+--- +a +a&,, 


are isotropic in Q, for every v € S, since c~'a is a square in Q*. 
For all primes p not in S, except p = q, a is not divisible by p. Hence, by the 
definition of S and Corollary 26, g* is isotropic in Q, for all v, except possibly v = q. 
Consequently, by the final remark of part (ii) of the proof, g* is isotropic in Q. 
Suppose first that n = 4. In this case, in the same way, h* = a3ez + ase? + ace is 
also isotropic in Q. Hence, by Proposition 6, there exist y1,..., y4 € Q such that 


2 2 2 2 
ayy; + d2y7 =a = —a3y3 — a4yq. 


Thus f is isotropic in Q. 

Suppose next that n > 5 and the result holds for all smaller values of n. Then the 
quadratic form h* is isotropic in Q»y, not only for v € S, but for all v. For if p is a 
prime which is not in S, then a3, a4, a5 are not divisible by p. It follows from Corol- 
lary 26 that the quadratic form asty + age} + asics is isotropic in Qp, and hence h* is 
also. Since h* is a non-singular quadratic form in n — | variables, it follows from the 
induction hypothesis that h* is isotropic in Q. The proof can now be completed in the 
same way as forn = 4. 


Corollary 37 A non-singular rational quadratic form in n > 5 variables is isotropic 
in Q if and only if it is neither positive definite nor negative definite. 


Proof This follows at once from Theorem 36, on account of Propositions 10 
and 31. 


Corollary 38 A non-singular quadratic form over the rational field Q represents a 
nonzero rational number c in Q if and only if it represents c in every completion Qy. 
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Proof Only the sufficiency of the condition requires proof. But if the rational quadratic 
form f (€1,..., €,) represents c in Q, for all v then, by Theorem 36, the quadratic form 


f*(@, 1,-5.2) = 6) + f (Ei, -. 5 &) 


is isotropic in Q. Hence f represents c in Q, by Proposition 6. 


Proposition 39 Two non-singular quadratic forms with rational coefficients are equiv- 
alent over Q if and only if they are equivalent over all completions Qy. 


Proof Again only the sufficiency of the condition requires proof. Let f and g be non- 
singular rational quadratic forms in n variables which are equivalent over Q, for all v. 

Suppose first that n = 1 and that f = aé*, g = by’. By hypothesis, for every v 
there exists f, € Q* such that b = at?. Thus ba! is a square in Q* for every v, and 
hence ba~! is a square in Q*, by part (i) of the proof of Theorem 36. Therefore f is 
equivalent to g over Q. 

Suppose now that n > 1 and the result holds for all smaller values of n. Choose 
some c € Q* which is represented by f in Q. Then f certainly represents c in Q, and 
hence g represents c in Q,, since g is equivalent to f over Q). Since this holds for all 
v, it follows from Corollary 38 that g represents c in Q. 

Thus, by the remark after the proof of Proposition 2, f is equivalent over Q to a 


quadratic form ee + f*(G,...,é,) and g is equivalent over Q to a quadratic form 
cer + g*(&,...,&). Since f is equivalent to g over Q,, it follows from Witt’s can- 
cellation theorem that f*(&,...,&,) is equivalent to g*(G,...,&,) over Qy. Since 


this holds for every v, it follows from the induction hypothesis that f* is equivalent to 
g* over Q, and so f is equivalent to g over Q. 


Corollary 40 Two non-singular quadratic forms f and g inn variables with rational 
coefficients are equivalent over the rational field Q if and only if 


(i) (det f)/(det g) is a square in Q*, 
(ii) ind* f = indtg, 
(ili) Sp(f) = Sp(g) for every prime p. 


Proof This follows at once from Proposition 39, on account of Propositions 10 
and 34. 


The strong Hasse principle (Theorem 36) says that a quadratic form is isotropic 
over the global field Q if (and only if) it is isotropic over all its local completions Q). 
The so-named weak Hasse principle (Proposition 39) says that two quadratic forms are 
equivalent over Q if (and only if) they are equivalent over all Q,. These local-global 
principles have proved remarkably fruitful. They organize the subject, they can be 
extended to other situations and, even when they fail, they are still a useful guide. We 
describe some results which illustrate these remarks. 

As mentioned at the beginning of this section, the strong Hasse principle continues 
to hold when the rational field is replaced by any algebraic number field. Waterhouse 
(1976) has established the weak Hasse principle for pairs of quadratic forms: if over 
every completion Q, there is a change of variables taking both f| to gi and fo to go, 
then there is also such a change of variables over Q. For quadratic forms over the field 
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F = K(t) of rational functions in one variable with coefficients from a field K, the 
weak Hasse principle always holds, and the strong Hasse principle holds for K = R, 
but not for all fields K. 

The strong Hasse principle also fails for polynomial forms over Q of degree > 2. 
For example, Selmer (1951) has shown that the cubic equation 3x? + 4y? + 5z3 = 0 
has no nontrivial solutions in Q, although it has nontrivial solutions in every comple- 
tion Q,. However, Gusié (1995) has proved the weak Hasse principle for non-singular 
ternary cubic forms. 

Finally, we draw attention to a remarkable local-global principle of Rumely (1986) 
for algebraic integer solutions of arbitrary systems of polynomial equations 


AiG@i,...,&) = += f-(1,...,én)) =0 


with rational coefficients. 
We now give some applications of the results which have been established. 


Proposition 41 A positive integer can be represented as the sum of the squares of 
three integers if and only if it is not of the form 4"b, where n > 0 and b = 7 mod8. 


Proof The necessity of the condition is easily established. Since the square of any 
integer is congruent to 0,1 or 4 mod 8, the sum of three squares cannot be congruent to 
7. For the same reason, if there exist integers x, y, z such that x*-+-y?+z* = 4”b, where 
n > 1 and bis odd, then x, y, z must all be even and thus (x /2)* + (y/2)? + (z/2)* = 
4”—|b, By repeating the argument n times, we see that there is no such representation 
if b =7mod8. 

We show next that any positive integer which satisfies this necessary condition is 
the sum of three squares of rational numbers. We need only show that any positive 
integer a € 7 mod 8, which is not divisible by 4, is represented in Q by the quadratic 
form 


fae4+ E42. 


For every odd prime p, f is isotropic in Q,, by Corollary 26, and hence any integer 
is represented in Q, by f, by Proposition 5. By Corollary 38, it only remains to show 
that f represents a in Q>. 

It is easily seen that if a = 1,3 or 5mod8, then there exist integers x1, x2,.x3 € 
{0, 1, 2} such that 


xe xD + HS =amods. 


Hence a7! Ge + xe + x4) is a square in Q} and f represents a in Qo. 
Again, ifa = 2 or 6 mod 8, then a = 2, 6, 10 or 14 mod 24 and it is easily seen that 
there exist integers x1, x2, x3 € {0, 1, 2, 3} such that 


EI +x +x = amod2". 
Hence a7! Ge + a; + 3) is a square in Q> and f represents a in Qo. 


To complete the proof of the proposition we show, by an elegant argument due to 
Aubry (1912), that if f represents c in Q then it also represents c in Z. 
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Let 


(x,y) ={f@ + y) — f@) — FO)}/2 


be the symmetric bilinear form associated with f, so that f(x) = (x, x), and assume 
there exists a point x € Q° such that (x,x) = c € Z. If x ¢ Z?, we can choose 
z € Z so that each coordinate of z differs in absolute value by at most 1/2 from 
the corresponding coordinate of x. Hence if we put y = x — z, then y # O and 
0<(,y) < 3/4. 

If x’ = x — Ay, where 2 = 2(x, y)/(y, y), then x’ € Q? and (x’, x’) = (x, x) =c. 
Substituting y = x — z, we obtain 


(y, y)x’ = (y, y)x — 2(%, y)y = {(z, z) — (&, x) }x + 2{(, x) — @, z)}. 


If m > O is the least common denominator of the coordinates of x, so that mx € Z°, it 
follows that 


m(y, yx’ = {(z, z) — c)}mx + 2{me — (mx, z)}z EZ. 
But 
m(y, y) = m{(x, x) — 2(x,z) + @, zZ)} = mc — 2(mx, z) +m, z) € Z. 


Thus if m’ > 0 is the least common denominator of the coordinates of x’, then m’ 
divides m(y, y). Hence m’ < (3/4)m. If x’ ¢ Z?, we can repeat the argument with 
x replaced by x’. After performing the process finitely many times we must obtain a 
point x* € Z? such that (x*, x*) =c. 


As another application of the preceding results we now prove 


Proposition 42 Let n, a,b be integers with n > 1. Then there exists a nonsingular 
n x n rational matrix A such that 


ATA =aln+bIn, (3) 
where J, is the n x n matrix with all entries 1, if and only ifa > 0, a+bn > O and 
(i) for n odd: a + bn is a square and the quadratic form 


is isotropic in Q; 
(ii) for n even: a(a + bn) is a square and either n = Omod4, orn = 2 mod4 and a 
is a sum of two squares. 


Proof If we put 


ft @an @ 1 
=i 1 1 ot 
Baloo 2 i i}; 
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then D := B'B and E := B‘ JB are diagonal matrices: 
D = diag[d},...,dy—1,n], E = diag[0,...,0,n7], 

where dj = j(j + 1) for1 < j <n. Hence, if C = D~'B'AB, then 
C'DC = B'A'AB. 

Thus the rational matrix A satisfies (3) if and only if the rational matrix C satisfies 
C'DC =aD+bDE, 

and consequently if and only if the diagonal quadratic forms 

f= NG tet dnige_y + ngps B= aint +--+ + adn—im;—y + n(a + bn) ny, 


are equivalent over Q. 

We now apply Corollary 40. Since (det g)/(det f) = a"~!(a + bn), the condition 
that det g/ det f be a square in Q* means that a+bn is a nonzero square if n is odd and 
a(a+bn) is anonzero square if n is even. Since indt f = n, the condition that indt g = 
indt f means that a > 0 and a + bn > 0. The relation s,(g) = sp(f) takes the form 


I] G@4é.adj)p [] @di.n@+bn))p= [] G.dip [] Gp. 
l<i<j<n l<i<n l<i<j<n l<i<n 
The multiplicativity and symmetry of the Hilbert symbol imply that 
(adj, adj) p = (a, 4) p(a, didj) p(di, dj)p. 
Since (a, a)p = (a, —1)p, it follows that s,)(g) = sp(f) if and only if 
(a, -1) 9 POP, ny! TT G@d,atbn)p [T] Gaidj)p =1. 
l<i<n l<i<j<n 


But 


[] didi = Gi ++ dn)" 


l<i<j<n 


and, by the definition of d;, d, -- -d,—1 1s in the same rational square class as n. Hence 
Sp(g) = Sp(f) if and only if 


(a, -1) 92")? (a, n) (an, a + bn)p = 1. (4) 


If nm is odd, then a + bn is a square and (4) reduces to (a, (-1I)"-YPn), = 1. 
But, since a + bn is a square, the quadratic form aé* + bnn* — ¢* is isotropic in Q 
and thus (a, bn), = 1 for all p. Hence (a, (—1)~/2n),, = 1 for all p if and only if 
(a, (-1)"-/?p), = | for all p. Since a > 0, this is equivalent to (i). 

If n is even, then a(a + bn) is a square and (4) reduces to (a, (— 19) 5 = 1. 
Since a > 0, this holds for all p if and only if the ternary quadratic form 
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ad? + (-1) "ar? — ¢?, 


is isotropic in Q. Thus it is certainly satisfied ifn = Omod4. If n = 2mod4 it is 
satisfied if and only if the quadratic form £7 + 4? — a¢? is isotropic. Thus it is satisfied 
if a is a sum of two squares. It is not satisfied if a is not a sum of two squares since 
then, by Proposition II.39, for some prime p = 3 mod 4, the highest power of p which 
divides a is odd and 


(a, 4)p = (a,-1)p = (p, -Dp = (“PY = = 1. 


It is worth noting that the last part of this proof shows that if a positive integer a is 
a sum of two rational squares, then it is also a sum of two squares of integers. 

It follows at once from Proposition 42 that, for any positive integer n, there is an 
n x n rational matrix A such that A'A = nJ, if and only if either n is an odd square, 
orn = 2mod4 and n is a sum of two squares, or n = 0 mod 4 (the Hadamard matrix 
case). 

In Chapter V we considered not only Hadamard matrices, but also designs. We 
now use Proposition 42 to derive the necessary conditions for the existence of square 
2-designs which were obtained by Bruck, Ryser and Chowla (1949/50). Let v, k, 2 be 
integers such that0 <2 <k <v andk(k — 1) = A(v — 1). Sincek —A +10 =F’, 
it follows from Proposition 42 that there exists a v x v rational matrix A such that 


AA=(k—A)IL, +AJy 
if and only if, either v is even and k — d is a square, or v is odd and the quadratic form 
CaDe (Ie OC = 


is isotropic in Q. 

A projective plane of order d corresponds to a (d? +d + 1,d + 1, 1) (square) 
2-design. In this case Proposition 42 tells us that there is no projective plane of order 
d if d is not a sum of two squares and d = 1 or 2mod4. In particular, there is no 
projective plane of order 6. 

The existence of projective planes of any prime power order follows from the 
existence of finite fields of any prime power order. (All known projective planes are of 
prime power order, but even for d = 9 there are projective planes of the same order d 
which are not isomorphic.) Since there is no projective plane of order 6, the least order 
in doubt is d = 10. The condition derived from Proposition 42 is obviously satisfied 
in this case, since 


10¢? — n* —¢7 =0 


has the solution € = 7 = 1, ¢ = 3. However, Lam, Thiel and Swiercz (1989) have 
announced that, nevertheless, there is no projective plane of order 10. The result was 
obtained by a search involving thousands of hours time on a supercomputer and does 
not appear to have been independently verified. 
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4 Supplements 


It was shown in the proof of Proposition 41 that if an integer can be represented as a 
sum of 3 squares of rational numbers, then it can be represented as a sum of 3 squares 
of integers. A similar argument was used by Cassels (1964) to show that if a poly- 
nomial can be represented as a sum of n squares of rational functions, then it can be 
represented as a sum of n squares of polynomials. This was immediately generalized 
by Pfister (1965) in the following way: 


Proposition 43 For any field F, if there exist scalars a1,...,Qn, € F and rational 
functions r\(t),...,1n(t) € F(t) such that 


p(t) = ari)? +++ + anrn(t)? 
is a polynomial, then there exist polynomials p\(t),..., Pn(t) € F[t] such that 
p(t) = arpilt)? +++ + anpalt)’. 


Proof Suppose first thatn = 1. We can write r1(¢) = pi(t)/qi(t), where pi(t) and 
qi(t) are relatively prime polynomials and qj (t) has leading coefficient 1. Since 


piaqitt)? = ai pitt)’, 


we must actually have qi (t) = 1. 

Suppose now that n > | and the result holds for all smaller values of n. We may 
assume that a; 4 0 for all j, since otherwise the result follows from the induction 
hypothesis. Suppose first that the quadratic form 


b = aie) +--+ bane? 


is isotropic over F. In this case there exists an invertible linear transformation 
cj = yi TikNk With TK E F(1 < j,k <n) such that 


b=m — 15 + Baz t+---+Bam, 
where f; € F forall j > 2. If we substitute 
m = {p(t) + 1}/2, m2 = {p@) — 1}/2,9; =0 forall j > 2, 


we obtain a representation for p(t) of the required form. 
Thus we now suppose that ¢ is anisotropic over F’. This implies that ¢ is also 
anisotropic over F(t), since otherwise there would exist a nontrivial representation 


aiqi(t)? +++» +angn(t)? = 0, 


where q;(t) € F[t](1 < j <n), and by considering the terms of highest degree we 
would obtain a contradiction. 
By hypothesis there exists a representation 


Pt) = art fil)/fo)V +--+ anl{fn)/foOy, 


4 Supplements 323 


where fo(t), fi(t),..-, fn(t) € F[t]. Assume that fo does not divide f; for some 
J €({i,...,n}. Then d := deg fo > 0 and we can write 


fjO = gj) O fo) +h;@), 


where g(t), hj (t) € Flt] anddegh; <d(1 <j <n). 
Let 


(x, y) = {P@ + y) — d(x) — d()}/2 
be the symmetric bilinear form associated with the quadratic form ¢ and put 
fH (Ah... fn), 8 = (1,--.,8n), A= (t1,...,4n). 
If 
fo = {(8, 8) — P} fo — 2{(f, 8) — pfol. f* = {(8, 8) — PIF — 2G 8) — pfols, 


and f* = Ch ..., f, ), then clearly to: ff, ...,f; € Ft]. Since (f, f) = Dig and 
g =(f —/A)/fo, we can also write 


fo = (h, h)/fo, f* = (hh) f — 26 WhY/ fy. 


It follows that deg fy < d and (f*, f*) = pie Also fy 4 0, since h 4 0 and ¢ is 
anisotropic. Thus 


PQ) =allffO/POFV +. tal fO/BOP. 


If fy does not divide f ; for some j € {1,...,}, we can repeat the process. After at 
most d steps we must obtain a representation for p(t) of the required form. 


It was already known to Hilbert (1888) that there is no analogue of Proposition 43 
for polynomials in more than one variable. Motzkin (1967) gave the simple example 


p(x, y) = 1—3x°y* +x4y? + x74, 


which is a sum of 4 squares in R(x, y), but is not a sum of any finite number of squares 
in R[x, y]. 

In the same paper in which he proved Proposition 43 Pfister introduced his 
multiplicative forms. The quadratic forms fa, fa,p in §2 are examples of such forms. 
Pfister (1966) used his multiplicative forms to obtain several new results on the 
structure of the Witt ring and then (1967) to give a strong solution to Hilbert’s 17th 
Paris problem. We restrict attention here to the latter application. 

Let g(x), h(x) € R[x] be polynomials in n variables x = (€,...,€,) with real 
coefficients. The rational function f(x) = g(x)/h(x) is said to be positive definite if 
f(a) > 0 for every a € R” such that h(a) 4 0. Hilbert’s 17th problem asks if every 
positive definite rational function can be represented as a sum of squares: 


fx) = fiGP +--+ fx), 
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where fi(x),..., fs(v) € R(x). The question was answered affirmatively by 
Artin (1927). Artin’s solution allowed the number s of squares to depend on the 
function f, and left open the possibility that there might be no uniform bound. 
Pfister showed that one can always take s = 2”. 

Finally we mention a conjecture of Oppenheim (1929-1953), that if f(G1,..., én) 
is a non-singular isotropic real quadratic form in n > 3 variables, which is not a scalar 
multiple of a rational quadratic form, then f(Z”) is dense in R, i.e. for each a € R and 
é > O there exist z1,...,Z, € Z such that | f(z1,..., Zn) —a| < e. (It is not difficult 
to show that this is not always true for n = 2.) Raghunathan (1980) made a general 
conjecture about Lie groups, which he observed would imply Oppenheim’s conjec- 
ture. Oppenheim’s conjecture was then proved in this way by Margulis (1987), using 
deep results from the theory of Lie groups and ergodic theory. The full conjecture of 
Raghunathan has now also been proved by Ratner (1991). 


5 Further Remarks 


Lam [18] gives a good introduction to the arithmetic theory of quadratic spaces. The 
Hasse—Minkowski theorem is also proved in Serre [29]. Additional information is 
contained in the books of Cassels [4], Kitaoka [16], Milnor and Husemoller [20], 
O’Meara [22] and Scharlau [28]. 

Quadratic spaces were introduced (under the name ‘metric spaces’) by Witt [32]. 
This noteworthy paper also made several other contributions: Witt’s cancellation theo- 
rem, the Witt ring, Witt’s chain equivalence theorem and the Hasse invariant in its most 
general form (as described below). Quadratic spaces are treated not only in books on 
the arithmetic of quadratic forms, but also in works of a purely algebraic nature, such 
as Artin [1], Dieudonné [8] and Jacobson [15]. 

An important property of the Witt ring was established by Merkur’ev (1981). In 
one formulation it says that every element of order 2 in the Brauer group of a field 
F is represented by the Clifford algebra of some quadratic form over F’. For a clear 
account, see Lewis [19]. 

Our discussion of Hilbert fields is based on Frohlich [9]. It may be shown that any 
locally compact non-archimedean valued field is a Hilbert field. Frohlich gives other 
examples, but rightly remarks that the notion of Hilbert field clarifies the structure of 
the theory, even if one is interested only in the p-adic case. (The name ‘Hilbert field’ 
is also given to fields for which Hilbert’s irreducibility theorem is valid.) 

In the study of quadratic forms over an arbitrary field F, the Hilbert symbol 
(a, b/F) is a generalized quaternion algebra (more strictly, an equivalence class of 
such algebras) and the Hasse invariant is a tensor product of Hilbert symbols. See, for 
example, Lam [18]. 

Hasse’s original proof of the Hasse-—Minkowski theorem is reproduced in 
Hasse [13]. In principle it is the same as that given here, using a reduction argument 
due to Lagrange for n = 3 and Dirichlet’s theorem on primes in an arithmetic progres- 
sion forn > 4. 

The book of Cassels contains a proof of Theorem 36 which does not use 
Dirichlet’s theorem, but it uses intricate results on genera of quadratic forms and is 
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not so ‘clean’. However, Conway [6] has given an elementary approach to the equiva- 
lence of quadratic forms over Q (Proposition 39 and Corollary 40). 

The book of O’Meara gives a proof of the Hasse—Minkowski theorem over any 
algebraic number field which avoids Dirichlet’s theorem and is ‘cleaner’ than ours, but 
it uses deep results from class field theory. For the latter, see Cassels and Frohlich [5], 
Garbanati [10] and Neukirch [21]. 

To determine if a rational quadratic form f(€,...,é)) = a= ajRE jek is 
isotropic by means of Theorem 36 one has to show that it is isotropic in infinitely 
many completions. Nevertheless, the problem is a finite one. Clearly one may assume 


that the coefficients a;, are integers and, if the equation f(x1,...,%n) = 0 has a non- 
trivial solution in rational numbers, then it also has a nontrivial solution in integers. 
But Cassels has shown by elementary arguments that if f(x1,...,*n) = 0 for some 


xj € Z, not all zero, then the x; may be chosen so that 


max |x;| < GH)°-D?, 
I<j<n 


where H = 37"; ,_, |ajx|. See Lemma 8.1 in Chapter 6 of [4]. 
Williams [31] gives a sharper result for the ternary quadratic form 


a(€,, 0) = al? + br’ + ct’, 


where a, b,c are integers with greatest common divisor d > 0. If g(x, y, z) = 0 for 
some integers x, y, z, not all zero, then these integers may be chosen so that 


|x] < |be|'?/d, |y| < lea|!/?/d, |2l < |ab|'/?/d. 


The necessity of the Bruck—Ryser—Chowla conditions for the existence of 
symmetric block designs may also be established in a more elementary way, without 
also proving their sufficiency for rational equivalence. See, for example, Beth et al. [2]. 
For the non-existence of a projective plane of order 10, see C. Lam [17]. 

For various manifestations of the local-global principle, see Waterhouse [30], 
Hsia [14], Gusié [12] and Green et al. [11]. 

The work of Pfister instigated a flood of papers on the algebraic theory of quadratic 
forms. The books of Lam and Scharlau give an account of these developments. For 
Hilbert’s 17th problem, see also Pfister [23], [24] and Rajwade [25]. 

Although a positive integer which is a sum of n rational squares is also a sum of n 
squares of integers, the same does not hold for higher powers. For example, 


5906 = (149/17)* + (25/17)*, 


but there do not exist integers m,n such that 5906 = m* +n‘, since 9* > 5906, 
2-74 < 5906 and 5906 — 84 = 1810 is not a fourth power. For the representation of a 
polynomial as a sum of squares of polynomials, see Rudin [27]. 

For Oppenheim’s conjecture, see Dani and Margulis [7], Borel [3] and Ratner [26]. 
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Vill 


The Geometry of Numbers 


It was shown by Hermite (1850) that if 
f(x) =x'Ax 


is a positive definite quadratic form in n real variables, then there exists a vector x with 
integer coordinates, not all zero, such that 


f (x) < cn(det A)'/”, 


where c,, is a positive constant depending only on n. Minkowski (1891) found a new 
and more geometric proof of Hermite’s result, which gave a much smaller value for the 
constant c,. Soon afterwards (1893) he noticed that his proof was valid not only for an 
n-dimensional ellipsoid f(x) < const., but for any convex body which was symmetric 
about the origin. This led him to a large body of results, to which he gave the somewhat 
paradoxical name ‘geometry of numbers’. It seems fair to say that Minkowski was the 
first to realize the importance of convexity for mathematics, and it was in his lattice 
point theorem that he first encountered it. 


1 Minkowski’s Lattice Point Theorem 


A set C C R" is said to be convex if x1,x2 € C implies 6x; + (1 — 0)x2 € C for 
0 < @ < 1. Geometrically, this means that whenever two points belong to the set the 
whole line segment joining them is also contained in the set. 

The indicator function or ‘characteristic function’ of a set S C R” is defined by 
x(x) = 1 or O according as x € S or x ¢ S. If the indicator function is Lebesgue 
integrable, then the set S' is said to have volume 


2(S) = , y(x)dx. 


The indicator function of a convex set C is actually Riemann integrable. It is easily 
seen that if a convex set C is not contained in a hyperplane of R”, then its interior 
int C (see §4 of Chapter I) is not empty. It follows that A(C) = 0 if and only if C is 
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contained in a hyperplane, and 0 < A(C) < oo if and only if C is bounded and is not 
contained in a hyperplane. 

A set S C R” is said to be symmetric (with respect to the origin) if x € S implies 
—x € S. Evidently any (nonempty) symmetric convex set contains the origin. 

A point x = (€1,...,&,) € R” whose coordinates &,..., &, are all integers will 
be called a lattice point. Thus the set of all lattice points in R” is Z”. 

These definitions are the ingredients for Minkowski’s lattice point theorem: 


Theorem 1 Let C be a symmetric convex set in R". If A(C) > 2", or if C is compact 
and 1(C) = 2", then C contains a nonzero point of Z”. 


The proof of Theorem | will be deferred to §3. Here we illustrate the utility of the 
result by giving several applications, all of which go back to Minkowski himself. 


Proposition 2 /f A is ann xn positive definite real symmetric matrix, then there exists 
a nonzero point x € Z" such that 


x! Ax < cy(det A)!/”, 
where Cy = (4/){(n/2)!}7/". 


Proof For any p > 0 the ellipsoid x'Ax < p is a compact symmetric convex 
set. By putting A = T'T, for some nonsingular matrix 7, it may be seen that the 
volume of this set is x, pr! 7 (det A)71/ 2 where x, is the volume of the n-dimensional 
unit ball. It follows from Theorem | that the ellipsoid contains a nonzero lattice point 
if xnp"/? (det A)~!/? = 2”. But, as we will see in §4 of Chapter IX, x, = 2"/?/(n/2)!, 
where x! = /’(x + 1). This gives the value c, for p. 


It follows from Stirling’s formula (Chapter IX, §4) that c, ~ 2n/ae forn > ov. 
Hermite had proved Proposition 2 with c, = (4/3)—)/?. Hermite’s value is smaller 
than Minkowski’s for n < 8, but much larger for large n. 

As asecond application of Theorem | we prove Minkowski’s linear forms theorem: 


Proposition 3 Let A be ann x n real matrix with determinant +1. Then there exists 
a nonzero point x € Z” such that Ax = y = (nx) satisfies 


Im} <1, |Iml <1 forl <k<n. 


Proof For any positive integer m, let C,, be the set of all x € R” such that Ax € Dn, 
where 


Dm = {y = (ye) € R” = |mi| < 141/m, |ng| <1 for2 <k <n}. 


Then C,,, is a symmetric convex set, since A is linear and D,, is symmetric and convex. 
Moreover A(C;,) = 2”(1 + 1/m), since 2A(Dm) = 2”(1 + 1/m) and A is volume- 
preserving. Therefore, by Theorem 1, C,, contains a lattice point x, 4 O. Since 
Cm C C, for allm > 1 and the number of lattice points in C is finite, there exist only 
finitely many distinct points x,,. Thus there exists a lattice point x #4 O which belongs 
to C,, for infinitely many m. Evidently x has the required properties. 
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The continued fraction algorithm enables one to find rational approximations to 
irrational numbers. The subject of Diophantine approximation is concerned with the 
more general problem of solving inequalities in integers. From Proposition 3 we can 
immediately obtain a result in this area due to Dirichlet (1842): 


Proposition 4 Let A = (a;x) be ann xm real matrix and lett > | be real. Then there 
exist integers q1,..-, ms Pls +++s Pn» WithO < max(|qil,..-5 |dml) < t"/™, such that 


m 
> AjkIk — Pj 


k=l 
pon/m Lin 0 
tA tly 


has determinant 1, it follows from Proposition 3 that there exists a nonzero vector 


r= (4) ez 
—P 


lgk| < r/™ (k= 1s.h5 90), 


<1/t A<j<n). 


Proof Since the matrix 


such that 


m 


> AjkIk — Pj 


k=l 


S1/t G HAyee.gh): 


Since g = O would imply |p;| < 1 for all 7 and hence p = O, which contradicts 
x # O, we must have max, |gx| > 0. 


Corollary 5 Let A = (a jx) be ann xm real matrix such that Az ¢ Z" for any nonzero 
vector z € Z!". Then there exist infinitely many (m + n)-tuples qi, ..., 4m; P1,-++5 Pn 
of integers with greatest common divisor | and with arbitrarily large values of 


Iq || = max(|qil, ---, aml) 


such that 


m 


>» AjkIk — Pj 


k=1 


<|iqi""" Gd<j<n). 


Proof Let q1,...;9m, P1,---+», Pn be integers satisfying the conclusions of Proposi- 
tion 4 for some t > 1. Evidently we may assume that q1,..., 9m, P1,---, Pn have 
no common divisor greater than 1. For given q1,..., qm, let 6; be the distance of 
e1 @ kk from the nearest integer and put 6 = max 6; (1 < j <n). By hypothesis 
0 <0 < 1, and by construction 


6< 1/t < lq". 
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Choosing some t’ > 2/6, we find a new set of integers g},..., Gs Py» +++» Dy Sat- 
isfying the same requirements with t replaced by t’, and hence with 0’ < 1/t' < 
0/2. Proceeding in this way, we obtain a sequence of (m + n)-tuples of integers 
q®, meee qe, po”, ee ps” for which 6“) — 0 and hence lg || —> o, since we 


cannot have q‘”) = q for infinitely many v. 


The hypothesis of the corollary is certainly satisfied if 1, @j1,..., @jm are linearly 
independent over the field Q of rational numbers for some j € {1,..., }. 

Minkowski also used his lattice point theorem to give the first proof that the dis- 
criminant of any algebraic number field, other than Q, has absolute value greater 
than 1. The proof is given in most books on algebraic number theory. 


2 Lattices 


In the previous section we defined the set of lattice points to be Z”. However, this de- 
finition is tied to a particular coordinate system in R”. It is useful to consider lattices 
from a more intrinsic point of view. The key property is ‘discreteness’. 

With vector addition as the group operation, IR” is an abelian group. A subgroup A 
is said to be discrete if there exists a ball with centre O which contains no other point 
of A. (More generally, a subgroup H of a topological group G is said to be discrete if 
there exists an open set U C G such that HMU = {e}, where e is the identity element 
of G.) 

If A is a discrete subgroup of R”, then any bounded subset of R” contains at most 
finitely many points of 4 since, if there were infinitely many, they would have an 
accumulation point and their differences would accumulate at O. In particular, 4 is a 
closed subset of R”. 


Proposition 6 [f x1, ..., Xm are linearly independent vectors in R", then the set 
A ={0x1 4 +++ Gum 2 O1,--+-+3 mn e Z} 
is a discrete subgroup of R". 


Proof Itis clear that A is a subgroup of R”, since x, y € A implies x — y € A. If A 
is not discrete, then there exist y) € A with |y| > |y®| > --- and |y™| > Oas 
v — oo. Let V be the vector subspace of R” with basis x;,..., x, and for any vector 


X= O1X1 +-++++AmXm, 
where a, € R (1 < k < m), put 
|x| = max(|ai|,..., aml). 
This defines a norm on V. We have 
yO) =x be OR, 
where ¢, Ns € Z(1 < k < m). Since any two norms on a finite-dimensional vector 


space are equivalent (Lemma VI.7), it follows that ae > O0avo> wll <k<m). 


Since ¢ ” is an integer, this is only possible if y™ = O for all large v, which is a 
contradiction. 
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The converse of Proposition 6 is also valid. In fact we will prove a sharper result: 


Proposition 7 [f A is a discrete subgroup of R", then there exist linearly independent 
vectors X1,...,Xm in R" such that 


A= {1x1 +++ + Gm : C1, .++5 Sn € Z}. 


Furthermore, if y1,..., Ym is any maximal set of linearly independent vectors in A, 
we can choose x1, ...,Xm So that 


AN (yi, . 0-5 Ye) = {Crepe +R eK EZ} UA <k <m), 
where (Y) denotes the vector subspace generated by the set Y. 


Proof Let S; denote the set of all a; > O such that a;y; € A and let yw be the infi- 
mum of all a; € S;. We are going to show that uw; € Sj. If this is not the case there 
exist al” e€ S; with al!) > a) > --- and al” — [1 as v > ov. Since the ball 
|x| < ( + “1)[y1| contains only finitely many points of A, this is a contradiction. 

Any a, € S; can be written in the form a = p“i+60, where p is a positive integer 
and 0 < @ < yw\. Since @ > 0 would imply @ € S}, contrary to the definition of 1, 
we must have 0 = 0. Hence if we put x1 = “11, then 


AN (yi) = {41x12 1 € Z}. 


Assume that, for some positive integer k (1 < k < m), we have found vectors 
X1,...,X% € A such that 


AN (yi, - 0-5 Ye) = {Cred Hee + KR 1, Oe E Dh. 


We will prove the proposition by showing that this assumption continues to hold when 
k is replaced by k + 1. 
Any x € AM (y1,..., ¥e41) has the form 


X= OX] Hee + OKXK A Ak+LVE+1, 


where a1,..., @%+1 € R. Let S,4; denote the set of all ay; > O which arise in such 


representations and let “,+41 be the infimum of all ax41 € Sx41. We are going to show 


that wig € Seyi. If vezi ¢ Seti, there exist af”, © S41 with at), > af, > -- 


and a”) 


kil 2 Hk+1 aS v > oo. Then A contains a point 


7M = ay? x1 ae ets ag xk 7 ap eth 


where ai” Ee RC <j < k). In fact, by subtracting an integral linear combination of 


X1,...,X% we may assume that 0 < a <1 < j <k). Since only finitely many 
points of A are contained in the ball |x| < |xy| +--+ + lag] + G+ weei) lye, this 
is a contradiction. 

Hence “x41 > 0 and A contains a vector 


Xkt1] = 1X1 +--+ + ORE A Mk+1Ve+1- 
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As for Sj, it may be seen that S,41 consists of all positive integer multiples of wx4+1. 
Hence any x € AM (y1,..., ¥e41) has the form 


X= OX He +a A Chet X41, 
where (1,..., 0% € Rand ¢41 € Z. Since 


X— Chr iXes1 € AN (y1,.- 5 Ve)s 


we must actually have (,...,¢ € Z. 


By being more specific in the proof of Proposition 7 it may be shown that there is 
a unique choice of x1,..., Xm such that 


yi = pix 
y2 = p21x1 + p22x2 


Yn = PmiX1 + Pm2X2 + +++ + PmmXm; 


where pij € Z, pii > 0, and 0 < pij < pii if j < i (Hermite’s normal form). 

It is easily seen that in Proposition 7 we can choose xj = yj (1 < i < m) if and 
only if, for any x € A and any positive integer h, x is an integral linear combination 
of y1,..., Ym Whenever hx is. 

By combining Propositions 6 and 7 we obtain 


Proposition 8 Fora set A C R" the following two conditions are equivalent: 


(i) A is adiscrete subgroup of IR" and there exists R > 0 such that, for each y € R", 
there is some x € A with |y —x| < R; 
(ii) there exist n linearly independent vectors x1, ..., Xn in R" such that 


A = {Cer +++ + Ontn i C1,-.-5 on € Z}. 


Proof If (i) holds, then in the statement of Proposition 7 we must have m = n, i.e. 
(ii) holds. On the other hand, if (ii) holds then 4 is a discrete subgroup of R”, by 
Proposition 6. Moreover, for any y € IR” we can choose x € A so that 


yx =Oxp +--+ +Onxn, 


where 0 < 6; < 1(j =1,...,”), and hence 


ly — x] < lai] +--+ + [xal. 


A set 4 C R" satisfying either of the two equivalent conditions of Proposition 8 
will be called a lattice and any element of A a lattice point. The vectors x1,..., Xn 
in (ii) will be said to be a basis for the lattice. 

A lattice is sometimes defined to be any discrete subgroup of R”, and what we 
have called a lattice is then called a ‘nondegenerate’ lattice. Our definition is chosen 
simply to avoid repetition of the word ‘nondegenerate’. We may occasionally use the 
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more general definition and, with this warning, believe it will be clear from the context 
when this occurs. 
The basis of a lattice is not uniquely determined. In fact y;,..., yy is also a basis if 


n 
y= > Gem Gj =1,...,7), 
k=1 


where A = (a jx) is ann x n matrix of integers such that det A = +1, since A! is 
then also a matrix of integers. Moreover, every basis y1,..., yn, is obtained in this way. 
For if 


n 
y= eu x= > fn Gj SH lye ), 
k=1 j=! 


where A = (a jx) and B = (fj) aren xn matrices of integers, then BA = J and hence 
(det B)(det A) = 1. Since det A and det B are integers, it follows that det A = +1. 
Let x1,...,X, be a basis for a lattice A C R”. If 


n 
xe = Dike (kK=1,...,n), 
j=l 


where e1,..., @, is the canonical basis for IR” then, in terms of the nonsingular matrix 
T = (yjx), the lattice A is just the set of all vectors Tz with z € Z”. The absolute 
value of the determinant of the matrix T does not depend on the choice of basis. For if 
X},.-+,; is any other basis, then 


n 
j=l 


where A = (a;;) is ann x n matrix of integers with det A = +1. Thus 


where T’ = Ce) satisfies T’ = TA’ and hence 
| det T’| = | det 7]. 


The uniquely determined quantity | det T| will be called the determinant of the lattice 
A and denoted by d(A). (Some authors, e.g. Conway and Sloane [14], call | det T |? 
the determinant of A, but others prefer to call this the discriminant of A.) 

The determinant d(/) has a simple geometrical interpretation. In fact it is the 
volume of the parallelotope /7, consisting of all points y € R” such that 


y= Ayxp +--+ +OnXxn, 


where 0 < & < 1(k = 1,...,n). The interior of /7 is a fundamental domain for the 
subgroup A, since 
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R" = Ucir+x), 
xeA 
intd7 +x) Nintd7 +x") =8 ifx,x’e Aandx Fx’. 


For any lattice 4 C R", the set A* of all vectors y € R” such that y'x € Z for 
every x € A is again a lattice, the dual (or ‘polar’ or ‘reciprocal’) of /. In fact, 


if A={Tz:z¢€Z"}, then A* = {(T')"!w: we Z"). 


Hence A is the dual of 4* and d(4)d(A*) = 1. A lattice A is self-dual if A* = A. 
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In this section we take up the proof of Minkowski’s lattice point theorem. The proof 
will be based on a very general result, due to Blichfeldt (1914), which is not restricted 
to convex sets. 


Proposition 9 Let S be a Lebesgue measurable subset of ", A a lattice in IR” with 
determinant d(A) and m a positive integer. 

If ACS) > m d(A), or if S is compact and 2(S) = m d(A), then there exist m + 1 
distinct points x, ...,Xm41 of S such that the differences x; — xx A < j,k <m+1) 
all lie in A. 


Proof Let by,...,bn, be a basis for A and let P be the half-open parallelotope 
consisting of all points x = 0,b; +---+O6,bn, where 0 < 6 < 1G =1,...,n). 
Then A(P) = d(A) and 


R" = UP +2), (P+2N(P4z)=9 ifzAzZ. 
ZE 


Suppose first that (5) > m d(A). If we put 


S,=SO(P+2z), T,=S:z-z, 


then T, C P, A(T,) = 4(S;) and 
A(Sy= > AS). 
zea 
Hence 


>) A(T.) = A(S) > m d(A) = mA(P). 
zea 


Since T, C P for every z, it follows that some point y € P is contained in at least 
m + | sets T,. (In fact this must hold for all y in a subset of P of positive measure.) 
Thus there exist m + | distinct points z1,..., Zm+1 of A and points x1,..., %m41 of S 
such that y= x; —zj (j =1,...,m+1). Then x,..., xm41 are distinct and 
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Xjox=zj-weA I< jkem+l). 


Suppose next that S is compact and A(S) = m d(A). Let {€,} be a decreasing 
sequence of positive numbers such that ¢, > 0 as v > ov, and let S, denote the set of 
all points of R” distant at most ¢, from S. Then S,, is compact, 2(S,) > A(S) and 


5 8a Seen, SAT VS, 
: 


By what we have already proved, there exist m + 1 distinct points x i 


>“ m+ 
of S, such that a? - x e€ A for all j,k. Since S, C S; and S$; is compact, by 


restricting attention to a subsequence we may assume that a 


l,...,m +1). Then xj € Sand x! — x} + xj — xg. Since x — xp” € A, this 


j 
is only possible if x; — x, = x — x? for all large v. Hence x1,...,%m41 are 


j 
distinct. 


> xjasvoo(j= 


Siegel (1935) has given an analytic formula which underlies Proposition 9 and 
enables it to be generalized. Although we will make no use of it, this formula will now 
be established. For notational simplicity we restrict attention to the (self-dual) lattice 
A=Z'. 


Proposition 10 Jf ¥ : R” —> C is a bounded measurable function which vanishes 
outside some compact set, then 


I, ¥ (x)b(~)dx = = 


weZ! 


2 


| P(x)jeW2tiw'x qy , 


where 


o(x)= >) Me +2). 


zeZn 


Proof Since ¥ vanishes outside a compact set, there exists a finite set T C Z” such 
that Y¥(x + z) = 0 for all x € R” if z € Z"\T. Thus the sum defining ¢(x) has 
only finitely many nonzero terms and ¢ also is a bounded measurable function which 
vanishes outside some compact set. 

If we write 


KS Cijp.vssGa)s CH= CO,.0%5Cr); 


then the sum defining ¢(x) is unaltered by the substitution ¢; — ¢; + 1 and hence ¢ 
has period | in each of the variables €; (j = 1, ...,n). Let /7 denote the fundamental 
parallelotope 


Tl = {x =(,...,:) €R":0<& <1 forj=1,...,n)}. 


Since the functions ertiw's (ay € Z") are an orthogonal basis for L*(J7), Parseval’s 
equality (Chapter I, §10) holds: 
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2 _ 2 
[ woorax => tool’ 


weZ" 
where 
as —2nriw'x 
C= i: P(x)e dx. 
0 
But 
Cis = > V(x + ze 2m iw'x gy 
TT eqn 
IT zeZ" 
since e7**! — | for any integer k. Hence 
_ —2riw! y 
Co = (ye dy. 
IR” 
On the other hand, 


[ woorax = | > P(x +2/)P (x + 2)dx 


Zz jzlegn 


= | > va tc)¥ate Fade 
IT 


z,z/EZ" 


= I, > YO)¥O +24 = I. ¥(y)d(y)dy. 


zeZ" 


Substituting these expressions in Parseval’s equality, we obtain the result. 


Suppose, in particular, that Y takes only real nonnegative values. Then so also does 
¢ and 


| Y(x)h(x)dx < sup ox) [ Y(x)dx. 
R" xeR" R" 


On the other hand, omitting all terms with w 4 0 we obtain 


2 2 
Y(x)d : 
“(ro 


pa 


weZ" 


[ P(xje 2's dy 
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Hence, by Proposition 10, 


sup f(x) > | Y(x)dx. 
xeR" R" 

For example, let S C IR” be a measurable set with 2(S) > m. Then there exists 
a bounded measurable set S’ C S with 1(S’) > m. If we take Y to be the indicator 
function of S’, then 


| V(x)dx = 1(S')>m 
IR" 
and we conclude that there exists y € IR” such that 


yy Y(y +z) = (y) > m. 


zeZn 


Since the only possible values of the summands on the left are 0 and 1, it follows that 
there exist m+ 1 distinct points z;,..., Zm41 € Z” = A such that y+z; € S forall j. 
The proof of Proposition 9 can now be completed in the same way as before. 

Let {Kq} be a family of subsets of R”, where each K, is the closure of a nonempty 
open set Ga, i.e. Kg is the intersection of all closed sets containing G,. The family 
{Kj} is said to be a packing of R" if a 4 a’ implies Gz 1 Gy = @ and is said to be 
a covering of R” if R” = L, Ka. It is said to be a tiling of R” if it is both a packing 
and a covering. 

For example, if /7 is a fundamental parallelotope of a lattice A, then the family 
{71 +a: a é€ A} isa tiling of R”. More generally, if G is a nonempty open sub- 
set of R” with closure K, we may ask whether the family {K +a: a € 4} of all 
A-translates of K is either a packing or a covering of IR”. Some necessary conditions 
may be derived with the aid of Proposition 9: 


Proposition 11 Let K be the closure of a bounded nonempty open set G C R" and 
let A be a lattice in R”. 

If the A-translates of K are a covering of R" then A(K) > d(A), and the inequality 
is strict if they are not also a packing. 

If the A-translates of K are a packing of R" then A(K) < d(A), and the inequality 
is strict if they are not also a covering. 


Proof Suppose first that the 4-translates of K cover R”. Then every point of a funda- 
mental parallelotope /7 of A has the form x — a, where x € K anda e€ A. Hence 


M(K) = >) A(K OT +.a)) 


aeA 


= 0 ACK - a) NM) > AUD) = dA). 


aeA 


Suppose, in addition, that the 4-translates of K are not a packing of R”. Then there 
exist distinct points x1, x2 in the interior G of K such that a = x; — x2 € A. Let 


Be = {x € R": |x| < e}. 
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We can choose ¢ > 0 so small that the balls B, + x; and B; + x2 are disjoint and 
contained in G. Then G’ = G\(B; + x1) is a bounded nonempty open set with closure 
K' = K\(intB, + x1). Since 


Be +x) = Be +x2+aC K' +a, 


the A-translates of K’ contain K and therefore also cover R". Hence, by what we have 
already proved, 1(K’) > d(A). Since 1(K) > 2(K’), it follows that 1(K) > d(A). 

Suppose now that the /-translates of K are a packing of R”. Then 4 does not 
contain the difference of two distinct points in the interior G of K, since G + a and 
G + bare disjoint if a, b are distinct points of A. It follows from Proposition 9 that 


1(K) = A(G) < d(A). 


Suppose, in addition, that the 4-translates of K do not cover R”. Thus there exists 
a point y € R” which is not in any 4-translate of K. We will show that we can choose 
é > 0so small that y is not in any 4-translate of K + Bz. 

If this is not the case then, for any positive integer v, there exists a, € A such that 


yeK+ Bi/y +a). 


Evidently the sequence a, is bounded and hence there exists a € A such that a, = a 
for infinitely many v. But then y € K + a, which is contrary to hypothesis. 

We may in addition assume ¢ chosen so small that |x| > 2¢ for every nonzero 
x € A. Then the set S = GU (B,; + y) has the property that 4 does not contain the 
difference of any two distinct points of S. Hence, by Proposition 9, 1($) < d(4). Since 


MK) = MG) <A), 


it follows that A(K) < d(A). 


We next apply Proposition 9 to convex sets. Minkowski’s lattice point theorem 
(Theorem 1) is the special case m = 1 (and A = Z”) of the following generalization, 
due to van der Corput (1936): 


Proposition 12 Let C be a symmetric convex subset of IR", A a lattice in RR" with 
determinant d(A), and m a positive integer. 

If A(C) > 2"md(A), or if C is compact and 2(C) = 2"md(A), then there exist 
2m distinct nonzero points y\,..., Ym of A such that 


yjeC (sjsm), 
yir-yaeC U<j,k <m). 


Proof The set S = {x/2:x © C} has measure A(S) = 1(C)/2”. Hence, by Proposi- 
tion 9, there exist m + 1 distinct points x1,...,xm+41 € C such that (x; — xx)/2 € A 
for all j,k. 

The vectors of R” may be totally ordered by writing x > x’ if x — x’ has its first 
nonzero coordinate positive. We assume the points x1, ..., Xm+1 € C numbered so that 


XL > XQ >: > Xm4i1- 


3 Proof of the Lattice Point Theorem; Other Results 339 
Put 
yp = (ej —Xm41)/2 (FH 1,...,m). 


Then, by construction, yj; €¢ A(j = 1,...,m). Moreover y; € C, since x1,...,Xm41€ 
C and C is symmetric, and similarly y; — yg = (xj; — xx)/2 € C. Finally, since 


yi>y2>-+++> ym > O~7 


we have yj # O andy; Aty if j Fk. 


The conclusion of Proposition 12 need no longer hold if C is not compact and 
2(C) = 2"m d(A). For example, take A = Z” and let C be the symmetric convex set 


C= {x =(41,...,6n) © R": lei] < m, [oj] < 1 for2 < j <n}. 


Then d(A) = 1 and A(C) = 2”m. However, the only nonzero points of A in C are the 
2(m — 1) points (£k,0,...,0) 1 <k <m-—1). 

To provide a broader view of the geometry of numbers we now mention 
without proof some further results. A different generalization of Minkowski’s lattice 
point theorem was already proved by Minkowski himself. Let 4 be a lattice in R” and 
let K be a compact symmetric convex subset of R” with nonempty interior. Then pK 
contains no nonzero point of A for small p > 0 and contains n linearly independent 
points of 4 for large p > 0. Let uw; denote the infimum of all » > O such that pK 
contains at least i linearly independent points of 4 (@@ = 1,...,m). Clearly the 
successive minima wi; = i(K, A) satisfy the inequalities 


O< mi Ss u2S°++S Un < OO. 
Minkowski’s lattice point theorem says that 
UIA(K) < 2"d(A). 
Minkowski’s theorem on successive minima strengthens this to 
2" d(A)/n! < peipr-++ UnA(K) < 2"d(A). 


The lower bound is quite easy to prove, but the upper bound is more deep-lying — 
notwithstanding simplifications of Minkowski’s original proof. If 4 = Z", then 
equality holds in the upper bound for the cube K = {(€,...,G)) € R” : |&| < 
1 for all i} and in the lower bound for the cross-polytope K = {(€1,...,é,) € R": 
pa ee 

If K is a compact symmetric convex subset of R” with nonempty interior, we 
define its critical determinant A(K) to be the infimum, over all lattices A with no 
nonzero point in the interior of K, of their determinants d(/). A lattice A for which 
d(A) = A(K) 1s called a critical lattice for K. It will be shown in 86 that a critical 
lattice always exists. 
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It follows from Proposition 12 that 4(K) > 2~"A(K). A conjectured sharpening 
of Minkowski’s theorem on successive minima, which has been proved by Minkowski 
(1896) himself for n = 2 and for n-dimensional ellipsoids, and by Woods (1956) for 
n = 3, claims that 


Hifl2+++ Hn A(K) < d(A). 


The successive minima of a convex body are connected with those of its dual body. 
If K is acompact symmetric convex subset of R” with nonempty interior, then its dual 


K* ={y eR": yx <1 forallx e K} 
has the same properties, and K is the dual of K*. Mahler (1939) showed that the 
successive minima of the dual body K* with respect to the dual lattice 4* are related 
to the successive minima of K with respect to 4 by the inequalities 


1 < uj(K, A)un-i41(K*, A*) G=1,...,n), 


and hence, by applying Minkowski’s theorem on successive minima also to K* and 
A*, he obtained inequalities in the opposite direction: 


Mi(K, A)un-i41(K*, A*) < 4" /A(K)A(K*) GG =1,...,n). 
By further proving that (K)A(K*) > 4"(n!)~?, he deduced that 
BAK: Ain A) SY? CST ed 
Dramatic improvements of these bounds have recently been obtained. Banaszczyk 
(1996), with the aid of techniques from harmonic analysis, has shown that there is 
a numerical constant C > 0 such that, for alln > 1 andalli € {1,...,7), 
Hi(K, A) un—i41(K*, A*) < CnC + logn). 


He had shown already (1993) that if K = By, is the n-dimensional closed unit ball, 
which is self-dual, then for all > 1 and alli € {1,...,7), 


Hi(By, A)un—i41(Bi, A*) <n. 


This result is close to being best possible, since there exists a numerical constant 
C’ > O and self-dual lattices 4, C R” such that 


H1(B1, An) Mn(Bi, An) = Hi(Bi, A 2 C0 


Two other applications of Minkowski’s theorem on successive minima will be men- 
tioned here. The first is a sharp form, due to Bombieri and Vaaler (1983), of ‘Siegel’s 
lemma’. In his investigations on transcendental numbers Siegel (1929) used Dirichlet’s 
pigeonhole principle to prove that if A = (a;,) is anm x n matrix of integers, where 
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m <n, such that |aj;x| < £ for all j,k, then the system of homogeneous linear 
equations 


Ax =0 


has a solution x = (€) in integers, not all 0, such that || < 1+ (np)"/“~” for all k. 
Bombieri and Vaaler show that, if A has rank m and if g > 0 is the greatest common 
divisor of all m x m subdeterminants of A, then there are n — m linearly independent 
integral solutions x; = (€jx) (j = 1,..., —m) such that 


n-m 


[| laill s (det(aa1"”?/g, 


j=l 


where ||x ; || = maxg |¢jx\. 

The second application, due to Gillet and Soulé (1991), may be regarded as an 
arithmetic analogue of the Riemann—Roch theorem for function fields. Again let K be 
a compact symmetric convex subset of R” with nonempty interior and let wz; denote 
the infimum of all p > 0 such that pK contains at least i linearly independent points 
of Z"(i = 1,...,). If M(K) is the number of points of Z” in K, and if h is the maxi- 
mum number of linearly independent points of Z” in the interior of K,, then Gillet and 
Soulé show that w1---“,/M(K) is bounded above and below by positive constants, 
which depend on n but not on K. 

A number of results in this section have dealt with compact symmetric convex sets 
with nonempty interior. Since such sets may appear rather special, it should be pointed 
out that they arise very naturally in connection with normed vector spaces. 

The vector space IR” is said to be normed if with each x € R” there is associated a 
real number |x| with the properties 


(i) |x| = 0, with equality if and only if x = O, 
(ii) |x + y| < |x| +] | for all x, y € R’, 
(iii) |ax| = |a||x| for all x € R” andalla eR. 


Let K denote the set of all x € R” such that |x| < 1. Then K is bounded, since all 
norms on a finite-dimensional vector space are equivalent. In fact K is compact, since 
it follows from (ii) that K is closed. Moreover K is convex and symmetric, by (ii) and 
(iii). Furthermore, by (i) and (iii), x/|x| € K for each nonzero x € R”. Hence the 
interior of K is nonempty and is actually the set of all x € R” such that |x| < 1. 
Conversely, let K be a compact symmetric convex subset of R” with nonempty 
interior. Then the origin is an interior point of K and for each nonzero x € R” there is a 
unique p > O such that px is on the boundary of K. If we put |x| = p~!, and |O| = 0, 
then (i) obviously holds. Furthermore, since |—x| = |x|, itis easily seen that (iii) holds. 
Finally, if y € R” and |y| = o~!, then px, ay € K and hence, since K is convex, 


po(p+a)'(x+y)=o(p+a)'px+pipta)layeK. 
Hence 
Ix + yl S$ (@+0)/po = |x| +lyI. 


Thus R” is a normed vector space and K the set of all x € R” such that |x| < 1. 
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4 Voronoi Cells 

Throughout this section we suppose IR” equipped with the Euclidean metric: 
d(y, z) = lly — <ll, 


where ||x|| = (x'x)!/2. We call ||x||?7 = x’x the square-norm of x and we denote the 
scalar product y'z by (y, z). 

Fix some point x9 € R”. For any point x 4 xo, the set of all points which are 
equidistant from xo and x is the hyperplane H, which passes through the midpoint of 
the segment joining xo and x and is orthogonal to this segment. Analytically, H, is the 
set of all y € R” such that 


(x — x0, y) = (x — x0, x + x0) /2, 
which simplifies to 
2(x — xo. y) = Ilxl|? — [xoll’. 


The set of all points which are closer to xo than to x is the open half-space G consist- 
ing of all points y € R” such that 


2(x — x0, y) < [lx II? — Ilxoll?. 


The closed half-space G, = H,UG, is the set of all points at least as close to xo as to x. 

Let X be a subset of IR” containing more than one point which is discrete, i.e. for 
each y € R” there exists an open set containing y which contains at most one point 
of X. It follows that each bounded subset of IR” contains only finitely many points of 
X since, if there were infinitely many, they would have an accumulation point. Hence 
for each y € R” there exists an x9 € X whose distance from y is minimal: 


d(xo, y) < d(x, y) forevery x € X. (1) 


For each x9 € X we define its Voronoi cell V(x) to be the set of all y € R” for 
which (1) holds. Voronoi cells are also called ‘Dirichlet domains’, since they were 
used by Dirichlet (1850) in R? before Voronoi (1908) used them in R”. 

If we choose r > 0 so that the open ball 


B, (xo) = {y € R" : d(xo, y) <r} 


contains no point of X except xo, then £,/2(xo) C V(x). Thus xo is an interior point 
of V(x0). 
Since 


Gy = {y € R" : d(@o, y) < dQ, y)}, 
we have V(x) C G, and actually 


Vio = F1 Gy. (2) 


xEX\xo 
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It follows at once from (2) that V (x9) is closed and convex. Hence V (x0) is the closure 
of its nonempty interior. 
According to the definitions of §3, the Voronoi cells form a tiling of R”, since 


R" = Uve), 


xeX 
intV(x)NintV(x’) =O ifx,x’e Xandx Fx’. 


A subset A of a convex set C is said to be a face of C if A is convex and, for 
any c,c’ € C, (c,c') 1 A # G implies c,c’ € A. The tiling by Voronoi cells has the 
additional property that V(x) 9 V(x’) is a face of both V(x) and V(x’) if x, x’ € X 
and x 4 x’. We will prove this by showing that if y,, y2 are distinct points of V(x) 
and if z € (y1, y2) M V(x’), then yy € V(x’). 

Since z € V(x) N V(x’), we have d(x, z) = d(x’, z). Thus z lies on the hyperplane 
H which passes through the midpoint of the segment joining x and x’ and is orthogo- 
nal to this segment. If y;) ¢ V(x’), then d(x, yj) < d(x’, y1). Thus y; lies in the open 
half-space G associated with the hyperplane H which contains x. But then yp lies in 
the open half-space G’ which contains x’, i.e. d(x’, y2) < d(x, y2), which contradicts 
y2 € V(x). 

We now assume that the set X is not only discrete, but also relatively dense, i.e. 


(t) there exists R > O such that, for each y € R"”, there is some x € X with 
d(x, y) < R. 


It follows at once that V(xo) C fr(xo0). Thus V(x) is bounded and, since it is 
closed, even compact. The ball /2r (xo) contains only finitely many points x1, ..., Xm 
of X apart from xo. We are going to show that 


V(x0) = ( Gi. (3) 


By (2) we need only show that if y € (\_, Gx;, then y € Gy for every x € X. 
Assume that d(xo, y) > R and choose z on the segment joining xo and y so that 
d(xo, z) = R. For some x € X we have d(x, z) < Rand hence 0 < d(xo, x) < 2R. 
Consequently x = x; for some i € {1,...,m}. Since d(x;,z) < R = d(xo, z), we 
have z € G x;- But this is a contradiction, since x9, y € Gx, and z is on the segment 
joining them. 
We conclude that d(xo, y) < R.Ifx € X andx #4 x9, x1,...,Xm, then 


d(x, y) = d(xo, x) — do, y) 
>2R—R=R> do, y). 


Consequently y € G, for every x € X. 

It follows from (3) that V(xo) is a polyhedron. Since V (xo) is bounded and has a 
nonempty interior, it is actually an n-dimensional polytope. 

The faces of a polytope are an important part of its structure. An (n — 1)- 
dimensional face of an n-dimensional polytope is said to be a facet and a 0-dimensional 
face is said to be a vertex. We now apply to V (x9) some properties common to all poly- 
topes. 
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In the representation (3) it may be possible to omit some closed half-spaces G xj 
without affecting the validity of the representation. By omitting as many half-spaces as 
possible we obtain an irredundant representation, which by suitable choice of notation 
we may take to be 


1 
V (x0) = (1 Gy 


for some / < m. The intersections V (xo) M Hy,;(1 < i < /) are then the distinct facets 
of V (xo). Any nonempty proper face of V (xo) is contained in a facet and is the inter- 
section of those facets which contain it. Furthermore, any nonempty face of V (xo) is 
the convex hull of those vertices of V (x9) which it contains. 

It follows that for each x; (1 < i < /) there is a vertex v; of V(x) such that 


d(xo, vj) = dj, 0;). 


For d(xo, v) < d(xj, v) for every vertex v of V(xo). Assume that d(xo, 0) < dQj, v) 
for every vertex v of V(xo). Then the open half-space G,, contains all vertices v and 
hence also their convex hull V (xo). But this is a contradiction, since V (x9) M Hy; isa 
facet of V (xo). 

To illustrate these results take X = Z" and x9 = O. Then the Voronoi cell 
V(O) is the cube consisting of all points y = (71,...,%) € R” with |mj| < 1/2 


(i = 1,...,n). It has the minimal number 2n of facets. 
In fact any lattice 4 in R” is discrete and has the property ({). For a lattice A we 
can restrict attention to the Voronoi cell V(A) := V(O), since an arbitrary Voronoi 


cell is obtained from it by a translation: V(x9) = V(O) + xo. The Voronoi cell of 
a lattice has extra properties. Since x € A implies —x € A, y € V(A) implies 
—y € V(A). Furthermore, if x; is a lattice vector determining a facet of V(A) and if 
y € V(A)N A,,, then ||y|| = |ly — x; ||. Since x € A implies x; — x € A, it follows 
that y € V(A)M Ay, implies x; — y € V(A) 2M A,,. Thus the Voronoi cell V (A) and 
all its facets are centrosymmetric. 

In addition, any orthogonal transformation of R” which maps onto itself the lattice 
A also maps onto itself the Voronoi cell V(A). Furthermore the Voronoi cell V (A) 
has volume d(A), by Proposition 11, since the lattice translates of V(A) form a tiling 
of R”. 

We define a facet vector or ‘relevant vector’ of a lattice A to be a vector x; € A 
such that V(A) M Hy, is a facet of the Voronoi cell V (A). If V (4) is contained in the 
closed ball Br = {x € R"” : ||x|| < R}, then every facet vector x; satisfies ||x;|| < 2R. 
For, if y € V(A) NM Ay, then, by Schwarz’s inequality (Chapter I, $4), 


llxill? = 2G, y) < 2lalllly 
The facet vectors were characterized by Voronoi (1908) in the following way: 


Proposition 13 A nonzero vector x € A is a facet vector of the lattice A C IR" if and 
only if every vector x' € x + 2A, except £x, satisfies ||x’|| > ||x|]. 


Proof Suppose first that ||x || < ||x’|| for all x’ #4 +x such that (x’ — x)/2 € A. If 
z € Aand x’ = 2z — x, then (x’ — x)/2 € A. Hence if z 4 O, x then 
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I|x/2|| < lz -— x/2ll, 


ie. x/2 € Gz. Since ||x/2|| = ||x — x/2||, it follows that x/2 € V(A4) and x is a facet 
vector. 

Suppose next that there exists x’ # +x such that w = (x’ — x)/2 € A and 
\|x’|| < |lx|]. Then also z = (x’+x)/2 € A andz,w 4 O.If y € G,M G_y, then 


2, y) < Ill’, —2(w, y) < [lull?. 
Hence, by the parallelogram law (Chapter I, 810), 


2(x, y) = 2(z, y) — 2(w, y) < [IzII? + [wll 
= |lxl]?/2 + Ix'1?/2 < lx’. 


That is, y € Gx. Thus G x is not needed to define V (4) and x is not a facet vector. 


Any lattice 4 contains a nonzero vector with minimal square-norm. Such a vector 
will be called a minimal vector. Its square-norm will be called the minimum of A and 
will be denoted by m(A). 


Proposition 14 /f A C R” is a lattice with minimum m(A), then any nonzero vector 
in A with square-norm < 2m(A) is a facet vector. In particular, any minimal vector is 
a facet vector. 


Proof Put r = m(A) and let x be a nonzero vector in A with ||x||? < 2r. If x is not 
a facet vector, there exists y 4 +x with (y — x)/2 € A such that ||y|| < ||x||. Since 
(y+x)/2 € A, |x + y|l* = 4r. Thus 


4r < ||x|I? + lly? +2, y) < 4r + 2(z, y), 


which is impossible. 


Proposition 15 For any lattice A C R", the number of facets of its Voronoi cell V (A) 
is at most 2(2” — 1). 


Proof Let x1,...,X, bea basis for 1. Then any vector x € A has a unique represen- 
tation x = x’ +x”, where x’ € 2 and 


" 
XO = AX, +--+ + AnXn, 


with a; € {0,1} for j = 1,...,n. Thus the number of cosets of 24 in A is 2”. But, 
by Proposition 13, each coset contains at most one pair +y of facet vectors. Since 2 A 
itself does not contain any facet vectors, the total number of facet vectors is at most 
2(2” — 1). 


There exist lattices 4 C R” for which the upper bound of Proposition 15 is 
attained, e.g. the lattice 4 = {Tz : z € Z"} with T = 1 + BJ, where J denotes 
the n x n matrix every element of which is 1 and £ = {(1 +n)!/? — 1}/n. 


Proposition 16 Every vector of a lattice A C R" is an integral linear combination of 
facet vectors. 
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Proof Let bi,..., bm be the facet vectors of A and put 


f= {x = pib, +--+ + Bnbmn > Bi,.. +, Bm € Z}. 


Evidently A’ is a subgroup of R” and actually a discrete subgroup, since A’ C A. If A’ 
were contained in a hyperplane of IR” any point on the line through the origin orthog- 
onal to this hyperplane would belong to the Voronoi cell V of A, which is impossible 
because V is bounded. Hence /’ contains n linearly independent vectors. 

Thus 4’ is a sublattice of A. It follows that the Voronoi cell V of A is contained in 
the Voronoi cell V’ of A’. But if y € V’, then 


lvl < bi — yl, @=1,...,m) 


and hence y € V. Thus V’ = V. Hence the 4’-translates of V and the 4-translates of 
V are both tilings of R”. Since A’ C A, this is possible only if A’ = A. 


Since every integral linear combination of facet vectors is in the lattice, Proposi- 
tion 16 implies 


Corollary 17 Distinct lattices in R" have distinct Voronoi cells. 


Proposition 16 does not say that the lattice has a basis of facet vectors. It is known 
that every lattice in R” has a basis of facet vectors if n < 6, but ifn > 6 this is still an 
open question. It is known also that every lattice in R” has a basis of minimal vectors 
whenn < 4 but, whenn > 4, there are lattices with no such basis. In fact a lattice may 
have no basis of minimal vectors, even though every lattice vector is an integral linear 
combination of minimal vectors. 

Lattices and their Voronoi cells have long been used in crystallography. An 
n-dimensional crystal may be defined mathematically to be a subset of R” of the form 


F+Az={x+y:xeF,ye 4}, 


where F is a finite set and A a lattice. Crystals may be studied by means of their 
symmetry groups. 

An isometry of IR” is an invertible affine transformation which leaves unaltered the 
Euclidean distance between any two points. For example, any orthogonal transforma- 
tion is an isometry and so is a translation by an arbitrary vector v. Any isometry is the 
composite of a translation and an orthogonal transformation. The symmetry group of a 
set X C R” is the group of all isometries of R” which map X to itself. 

We define an n-dimensional crystallographic group to be a group G of isometries 
of R” such that the vectors corresponding to translations in G form an n-dimensional 
lattice. It is not difficult to show that a subset of R” is an n-dimensional crystal if and 
only if it is discrete and its symmetry group is an n-dimensional crystallographic group. 

It was shown by Bieberbach (1911) that a group G of isometries of IR” is a crys- 
tallographic group if and only if it is discrete and has a compact fundamental domain 
D, i.e. the sets {g(D) : g € G} form a tiling of IR”. He could then show that the 
translations in a crystallographic group form a torsion-free abelian normal subgroup 
of finite index. He showed later (1912) that two crystallographic groups G1, G2 are 
isomorphic if and only if there exists an invertible affine transformation A such that 
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G2 = A7!G,A. With the aid of results of Minkowski and Jordan it follows that, 
for a given dimension n, there are only finitely many non-isomorphic crystallographic 
groups. These results provided a positive answer to the first part of the 18th Problem 
of Hilbert (1900). 

The structure of physical crystals is analysed by means of the corresponding 
3-dimensional crystallographic groups. A stronger concept than isomorphism is useful 
for such applications. Two crystallographic groups G;, G2 may be said to be properly 
isomorphic if there exists an orientation-preserving invertible affine transformation A 
such that G; = A7~!G,A. An isomorphism class of crystallographic groups either 
coincides with a proper isomorphism class or splits into two distinct proper isomor- 
phism classes. 

Fedorov (1891) showed that there are 17 isomorphism classes of 2-dimensional 
crystallographic groups, none of which splits. Collating earlier work of Sohncke 
(1879), Schoenflies (1889) and himself, Fedorov (1892) also showed that there are 219 
isomorphism classes of 3-dimensional crystallographic groups, 11 of which split. More 
recently, Brown et al. (1978) have shown that there are 4783 isomorphism classes of 
4-dimensional crystallographic groups, 112 of which split. 


5 Densest Packings 


The result of Hermite, mentioned at the beginning of the chapter, can be formulated 
in terms of lattices instead of quadratic forms. For any real non-singular matrix 7’, the 
matrix 


A=T'T 


is a real positive definite symmetric matrix. Conversely, by a principal axes transfor- 
mation, or more simply by induction, it may be seen that any real positive definite 
symmetric matrix A may be represented in this way. 

Let A be the lattice 


A={y=TxeR":xeZ"} 
and put 
y (A) = m(A)/d(A)/", 


where d(A) is the determinant and m(A) the minimum of 4. Then y (pA) = y (A) 
for any p > 0. Hermite’s result that there exists a positive constant c,, depending only 
onn, such that 0 < x‘ Ax < c,(det A) '/" for some x € Z” may be restated in the form 


y (A) < en. 


Hermite’s constant yy is defined to be the least positive constant c, such that this 
inequality holds for all 4 C R”. 

It may be shown that y,” is a rational number for each n. It follows from Proposi- 
tion 2 that His cave /n < 2/me. Minkowski (1905) showed also that 
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lim, ,oon/n > 1/27e = 0.0585..., 


and it is possible that actually limp oo yn/n = 1/27e. The significance of Hermite’s 
constant derives from its connection with lattice packings of balls, as we now explain. 

Let A be a lattice in R” and K a subset of IR” which is the closure of a nonempty 
open set G. We say that 4 gives a lattice packing for K if the family of translates 
K +x (* € A) isa packing of R”, i.e. if for any two distinct points x, y € A the inte- 
riors G + x and G + y are disjoint. This is the same as saying that / does not contain 
the difference of any two distinct points of the interior of K, since g +x = g’ + y if 
and only if g’ — g = x — y. If K is a compact symmetric convex set with nonempty 
interior G, it is the same as saying that the interior of the set 2K contains no nonzero 
point of A, since in this case g, g’ € G implies (g’ — g)/2 € Gand 2g = g — (—g). 

The density of the lattice packing, i.e. the fraction of the total space which is 
occupied by translates of K, is clearly 2(K)/d(A). Hence the maximum density of 
any lattice packing for K is 


0(K) = 4(K)/A2K) =2™A(K)/A(K), 


where A(K) is the critical determinant of K, as defined in §3. The use of the word 
‘maximum’ is justified, since it will be shown in 86 that the infimum involved in the 
definition of critical determinant is attained. 

Our interest is in the special case of a closed ball: K = B, = {x € R" : ||x|| < p}. 
By what we have said, / gives a lattice packing for B, if and only if the interior of 
B2, contains no nonzero point of A, i.e. if and only if m(A)!/? > 29. Hence 


(By) = sup{A(B,)/d(A) : m(A)'/? = 2p} 
= knp" sup{d(A)~! : m(A)'/? = 2p}, 


where x, = 2"/*/(n/2)! again denotes the volume of the unit ball in R”. By virtue of 
homogeneity it follows that 


On := 6(Bp) = 27" Kn sup y (A)"/?, 
A 


where the supremum is now over all lattices 4 C R”; that is, in terms of Hermite’s 
constant yn, 

on = oe 
Thus y,, like 6,, measures the densest lattice packing of balls. A lattice 4 C R” for 
which y (4) = yn, i.e. a critical lattice for a ball, will be called simply a densest lattice. 

The densest lattice in IR” is known for eachn < 8, and is uniquely determined apart 
from isometries and scalar multiples. In fact these densest lattices are all examples of 
indecomposable root lattices. These terms will now be defined. 

A lattice A is said to be decomposable if there exist additive subgroups 41, 42 
of A, each containing a nonzero vector, such that (x;,.x2) = O for all x; € A, and 
x2 € Az, and every vector in A is the sum of a vector in 4; and a vector in 42. Since 
A, and A> are necessarily discrete, they are lattices in the wide sense (i.e. they are not 
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full-dimensional). We say also that 4 is the orthogonal sum of the lattices 4; and Ap. 
The orthogonal sum of any finite number of lattices is defined similarly. A lattice is 
indecomposable if it is not decomposable. 

The following result was first proved by Eichler (1952). 


Proposition 18 Any lattice A is an orthogonal sum of finitely many indecomposable 
lattices, which are uniquely determined apart from order. 


Proof (i) Define a vector x € A to be ‘decomposable’ if there exist nonzero vectors 
xX1,x2 € A such that x = x; + x2 and (x1, x2) = 0. We show first that every nonzero 
x € Ais asum of finitely many indecomposable vectors. 

By definition, x is either indecomposable or is the sum of two nonzero orthogonal 
vectors in 4. Both these vectors have square-norm less than the square-norm of x, and 
for each of them the same alternative presents itself. Continuing in this way, we must 
eventually arrive at indecomposable vectors, since there are only finitely many vectors 
in A with square-norm less than that of x. 

(ii) If A is the orthogonal sum of finitely many lattices L, then, by the definition 
of an orthogonal sum, every indecomposable vector of 4 lies in one of the sublat- 
tices L,,. Hence if two indecomposable vectors are not orthogonal, they lie in the same 
sublattice Ly. 

(iii) Call two indecomposable vectors x, x’ ‘equivalent’ if there exist indecompos- 
able vectors x = x0,X1,..-,Xk—1,Xk = x’ such that (xj,xj4+1) A OforO < j <k. 
Clearly ‘equivalence’ is indeed an equivalence relation and thus the set of all indecom- 
posable vectors is partitioned into equivalence classes @,,. Two vectors from different 
equivalence classes are orthogonal and, if 4 is an orthogonal sum of lattices Ly as in 
(ii), then two vectors from the same equivalence class lie in the same sublattice Ly. 

(iv) Let A,, be the subgroup of A generated by the vectors in the equivalence class 
€,,. Then, by (i), A is generated by the sublattices /,,. Since, by (iii), 4, is orthogo- 
nal to A, if u ¢ uw’, A is actually the orthogonal sum of the sublattices 4,. If A is 
an orthogonal sum of lattices L, as in (ii), then each A, is contained in some Ly. It 
follows that each 4, is indecomposable and that these indecomposable sublattices are 
uniquely determined apart from order. 


Let 4 be a lattice in R”. If A C A*,ie.if (x, y) € Zforallx, y € A, then A is said 
to be integral. If (x, x) is an even integer for every x € A, then A is said to be even. 
(It follows that an even lattice is also integral.) If 1 is even and every vector in / is an 
integral linear combination of vectors in 4 with square-norm 2, then J is said to be a 
root lattice. 

Thus in a root lattice the minimal vectors have square-norm 2. It may be shown by a 
long, but elementary, argument that any root lattice has a basis of minimal vectors such 
that every minimal vector is an integral linear combination of the basis vectors with 
coefficients which are all nonnegative or all nonpositive. Such a basis will be called a 
simple basis. The facet vectors of a root lattice are precisely the minimal vectors, and 
hence its Voronoi cell is the set of all y € IR” such that (y, x) < 1 for every minimal 
vector x. 

Any root lattice is an orthogonal sum of indecomposable root lattices. It was shown 
by Witt (1941) that the indecomposable root lattices can be completely enumerated; 
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Table 1. Indecomposable root lattices 


An = {x = (4, 41,...,€:) 24! sq +e 4+---+& =0} (n> VD); 
Dn = (x = (G1, .--,Gn) € Z" : & ++++ + En even} (n > 3); 

Eg = DgU DI, where Dt = (1/2, 1/2,..., 1/2) + Dg; 

E7 = {x = (€1,...,€8) © Eg :¢7 = —€g}; 

Eo = {x = (,...,08) € Eg cg = 7 = —Cg}. 


they are all listed in Table 1. We give also their minimal vectors in terms of the canon- 
ical basis €1,..., €, of R”. 

The lattice A, has n(n + 1) minimal vectors, namely the vectors +(e; — ex) 
(0 < j < k <n), and the vectors eg — e1, €1 — €2,..-,€n—1 — en form a simple 
basis. By calculating the determinant of B’ B, where B is the (n + 1) x n matrix whose 
columns are the vectors of this simple basis, it may be seen that the determinant of the 
lattice A, is (n + 1)!/”. 

The lattice D, has 2n(n — 1) minimal vectors, namely the vectors te; + ex 
(1 < j < k <n). The vectors e; — e2, e2 — €3,...,@n—1 — €n, @n—1 + €n form a 
simple basis and hence the lattice D, has determinant 2. 

The lattice Eg has 240 minimal vectors, namely the 112 vectors te; + ex (1 < 
J < k < 8) and the 128 vectors (+e; + --- + eg)/2 with an even number of minus 
signs. The vectors 


vy = (e] —e2 —--- —e7 + @g)/2, v2 =e, +22, 


v3 =@€2—€], 04 =—€3—€2,..., DS =e7—&, 


form a simple basis and hence the lattice has determinant 1. 

The lattice £7 has 126 minimal vectors, namely the 60 vectors re; +e, (1 < j < 
k < 6), the vectors +(e7 — eg) and the 64 vectors (Dh (+e;) —e7 + es) /2 with 
an odd number of minus signs in the sum. The vectors 01, ..., v7 forma simple basis 
and the lattice has determinant 2. 

The lattice E~ has 72 minimal vectors, namely the 40 vectors +e; + ex (1 < j < 


k < 5) and the 32 vectors +( ae (+e) — e6 — e7 + eg) /2 with an even number of 
minus signs in the sum. The vectors 01, ..., 06 form a simple basis and the lattice has 
determinant /3. 

We now return to lattice packings of balls. The densest lattices for n < 8 are given 
in Table 2. These lattices were shown to be densest by Lagrange (1773) for n = 2, 
by Gauss (1831) form = 3, by Korkine and Zolotareff (1872,1877) for n = 4,5 and 
by Blichfeldt (1925,1926,1934) for n = 6, 7, 8. 

Although the densest lattice in R” is unknown for every n > 8, there are plausible 
candidates in some dimensions. In particular, a lattice discovered by Leech (1967) is 
believed to be densest in 24 dimensions. This lattice may be constructed in the follow- 
ing way. Let p be a prime such that p = 3 mod4 and let H,, be the Hadamard matrix 
of order n = p+ | constructed by Paley’s method (see Chapter V, §2). The columns 
of the matrix 
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Table 2. Densest lattices in R” 


n oA Yn On 

1 Ay 1 1 

2 Ad (4/3)!/2= 1.1547... 31/27 /6=0.9068... 
3 D3 21/3 — 1.2509... 21/2 /6=0.7404... 
4 Dg 21/2 1.4142... x7/16=0.6168... 
5 D 81/5 1.5157... 2!/27?/30=0.4652... 
6 Eo (64/3)!/6=1.6653... 3!/223/144=0.3729... 
7 Eq (64)!/7 = 1.8114... 23/105 = 0.2952... 
8 Eg 2 x /384 = 0.2536... 


T =(n/4+41)-1/? an it, = ) 


In 


generate a lattice in R*”. For p = 3 we obtain the root lattice Eg and for p = 11 the 
Leech lattice 424. 


Leech’s lattice may be characterized as the unique even lattice A in R** with 
d(A) = 1 and m(A) > 2. It was shown by Conway (1969) that, if G is the group 
of all orthogonal transformations of R** which map the Leech lattice 44 onto itself, 
then the factor group G/{+J/4} is a finite simple group, and two more finite simple 
groups are easily obtained as (stabilizer) subgroups. These are three of the 26 sporadic 
simple groups which were mentioned in §7 of Chapter V. 

Leech’s lattice has 196560 minimal vectors of square-norm 4. Thus the packing of 
unit balls associated with 424 is such that each ball touches 196560 other balls. It has 
been shown that 196560 is the maximal number of nonoverlapping unit balls in R7+ 
which can touch another unit ball and that, up to isometry, there is only one possible 
arrangement. 

Similarly, since Eg has 240 minimal vectors of square-norm 2, the packing of balls 
of radius 2~!/2 associated with Eg is such that each ball touches 240 other balls. It has 
been shown that 240 is the maximal number of nonoverlapping balls of fixed radius in 
IR§ which can touch another ball of the same radius and that, up to isometry, there is 
only one possible arrangement. 

In general, one may ask what is the kissing number of R", i.e. the maximal number 
of nonoverlapping unit balls in R” which can touch another unit ball? The question, 
for n = 3, first arose in 1694 in a discussion between Newton, who claimed that the 
answer was 12, and Gregory, who said 13. It was first shown by Hoppe (1874) that 
Newton was right, but in this case the arrangement of the 12 balls in R? is not unique 
up to isometry. One possibility is to take the centres of the 12 balls to be the vertices 
of a regular icosahedron, the centre of which is the centre of the unit ball they touch. 

The kissing number of R! is clearly 2. It is not difficult to show that the kissing 
number of R? is 6 and that the centres of the six unit balls must be the vertices of a 
regular hexagon, the centre of which is the centre of the unit ball they touch. For n > 3 
the kissing number of R” is unknown, except for the two cases n = 8 andn = 24 
already mentioned. 
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6 Mahler’s Compactness Theorem 


It is useful to study not only individual lattices, but also the family %, of all lattices 
in R". A sequence of lattices A, € Y%, will be said to converge to a lattice A € Ly, 
insymbols A; — A, if there exist bases by1,..., Den of Ag (kK = 1, 2,...) and a basis 
bi,..., bn, of A such that 


by > bjaskaow (j=1,...,n). 


Evidently this implies that d(A,) — d(A) as k > oo. Also, for any x € A there 
exist xz € A, such that x, > x ask > ow. In fact if x = ajb, +--+ + aynby, where 
a;€ Z(G =1,...,n), we can take x, = abe, +--+ + Anden- 

It is not obvious from the definition that the limit of a sequence of lattices is 
uniquely determined, but this follows at once from the next result. 


Proposition 19 Let A be a lattice in R" and let { Ax} be a sequence of lattices in R" 
such that Ay > Aask — oo. If xp € Ap and xp > x ask > ~M, thenx € A. 


Proof With the above notation, 
x =ajb)+---+anbn, 

where a; € R i = 1,..., 7), and similarly 

Xk = ab) +-+-+Akmbn, 
where ag; € Rand ag aj ask > co (i = 1,...,n). 

The linear transformation 7; of R” which maps b; to by (i = 1,...,) can be 
written in the form 
Tk = 1 — Ax, 
where Ay > O ask > ov. It follows that 
T,' =U — Ag! = 1+ Ag+ Apt: =14+Ck, 

where also C, > O ask — oo. Hence 


Xk = T' (ane + +++ + Oknbkn) 
= (arr + yei)der 4+ + (kn + Nkn)Okn, 


where yj > Oask > off = 1,...,n). But ax; + mei € Z for every k. Letting 
k > ow, we obtain a; € Z. That is, x € A. 


It is natural to ask if the Voronoi cells of a convergent sequence of lattices also 
converge in some sense. The required notion of convergence is in fact older than the 
notion of convergence of lattices and applies to arbitrary compact subsets of R”. 

The Hausdorff distance h(K, K') between two compact subsets K, K’ of R” is 
defined to be the infimum of all » > 0 such that every point of K is distant at most 
p from some point of K’ and every point of K’ is distant at most p from some point 
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of K. We will show that this defines a metric, the Hausdorff metric, on the space of all 
compact subsets of R”. 
Evidently 


0 < A(K, K') =h(K', K) < @. 


Moreover h(K, K’) = O implies K = K’. For if x’ € K’, there exist x, € K such that 
x~ — x’ and hence x’ € K, since K is closed. Thus K’ C K, and similarly K C K’. 
Finally we prove the triangle inequality 


h(K, K") < W(K, K’) +h(K’, K”). 


To simplify writing, put p = h(K, K’) and p’ = h(K’, K”). For any ¢ > 0, if 
x € K there exist x’ € K’ such that ||x — x’|| < p +e and then x” € K” such 
that ||x’ — x"|| < p’ +e. Hence 


IIx —x"| <p +p’ +2e¢. 


Similarly, if x” ¢ K” there exists x € K for which the same inequality holds. Since ¢ 
can be arbitrarily small, this completes the proof. 
The definition of Hausdorff distance can also be expressed in the form 


h(K, K’) = inf{p > 0: K C K’+ By, K’C K + By}, 


where B, = {x € R": ||x|| < p}. A sequence K; of compact subsets of IR” converges 
to a compact subset K of R” ifh(K;, K) > Oas j > ow. 

It was shown by Hausdorff (1927) that any uniformly bounded sequence of com- 
pact subsets of IR” has a convergent subsequence. In particular, any uniformly bounded 
sequence of compact convex subsets of IR” has a subsequence which converges to 
a compact convex set. This special case of Hausdorff’s result, which is all that we 
will later require, had already been established by Blaschke (1916) and is known as 
Blaschke’s selection principle. 


Proposition 20 Let {A;} be a sequence of lattices in IR" and let Vi. be the Voronoi 
cell of Ax. If there exists a compact convex set V with nonempty interior such that 
Vi. — V in the Hausdorff metric as k — ov, then V is the Voronoi cell of a lattice A 
and Ay > Aask > ~. 


Proof Since every Voronoi cell Vx is symmetric, so also is the limit V. Since V has 
nonempty interior, it follows that the origin is itself an interior point of V. Thus there 
exists 6 > O such that the ball Bs = {x € R" : ||x|| < 6} is contained in V. 

It follows that Bsj2 C Vx for all large k. The quickest way to see this is to use 
Rddstrém’s cancellation law, which says that if A, B, C are nonempty compact con- 
vex subsets of R” such that A+ C C B+C, then A C B. In the present case we have 


Bsj2 + Bsj2 © Bs GC V © Ve + B52 fork = ko, 


and hence By/2 C Vx fork > ko. Since also Vk © V + Boyz for all large k, there exists 
R > Osuch that Vz C Bp for all k. 
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The lattice A; has at most 2(2” — 1) facet vectors, by Proposition 15. Hence, by 
restriction to a subsequence, we may assume that all 4; have the same number m of 
facet vectors. Let xx, ..., Xkm be the facet vectors of A; and choose the notation so 
that x41, ..., Xkn are linearly independent. Since they all lie in the ball Bop, by restric- 
tion to a further subsequence we may assume that 


Xj > xj askao (j=1,...,m). 
Evidently ||x;|| > 6 (7 = 1,...,m) since, fork > ko, all nonzero x € Ax have 
I|x|| = 6. 
The set A of all integral linear combinations of x1, ..., Xm 18 certainly an additive 


subgroup of IR”. Moreover A is discrete. For suppose y € A and ||y|| < 6. We have 
YS OX] +++ + AmXm, 
where aj; € Z(j =1,...,m). If 
Vk = OX] +++ + AmXkm, 


then yz > y ask — oo and hence || yx|| < 6 for all large k. Since yz € Ax, it follows 
that y, = O for all large k and hence y = O. 
Since the lattice Ai, with basis xx1,..., Xn 1s a sublattice of 4;, we have 


d(A,) = d(Ag) = A(VE) = A(Boy2)- 
Since d( A.) = | det(xx1, ..., Xkn)|, it follows that also 
| det(x1,...,%n)| = A(Bs/2) > 0. 


Thus the vectors x1, ...,X» are linearly independent. Hence 4 is a lattice. 
Let b,..., by be a basis of A. Then, by the definition of 4, 


bj = ax +--+ + GimXm, 
where aj; € Z(1 <i <n,1 < j < m). Put 


bei = Ai1XK1 +++ + CimXkm- 


Then by; € Ax and by > bj ask > oc (i = 1,...,n). Hence, for all large k, the 
vectors by1,..., Den are linearly independent. We are going to show that by1,..., bkn 
is a basis of Ax for all large k. 

Since b1,..., Dy is a basis of A, we have 


xj =yyibit---+ yjnbn, 
where yj; € Z(1 <i <n,1 < j < m). Hence, if 
Ykji = PDA He FY jndkns 


then yx; € Ax and yx; > xj ask > co (j = 1,...,m). Thus, for all large k, 
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lng — xg ll <d G=L,...,m). 


Since yxj — xx; € Ax, this implies that, for all large k, yxj = xxj (G = 1,...,m). Thus 
every facet vector of A; is an integral linear combination of by1,..., bgn and hence, 
by Proposition 16, every vector of Ax is an integral linear combination of bg1, ..., Den. 
Since bg1,..., Dyn are linearly independent, this shows that they are a basis of Ax. 

Let W be the Voronoi cell of 4. We wish to show that V = W. If v € V, then 
there exist vg € Vz such that on, > v. Assume v ¢ W. Then ||v|| > ||z — v|| for some 
z € A, and so 


loll = Iz —oll +p, 
where p > 0. There exist zz € Ax such that z, — z. Then, for all large k, 
loll > Ize — oll + p/2 
and hence, for all large k, 
lox ll > We — vel. 


But this contradicts vg € Vx. 
This proves that V C W. On the other hand, V has volume 


A(V) = lim A(Vp) = lim d(A,x) 
k-> 00 k-> 00 
= lim |det(by,..., den)| 
k-> 00 
= | det(bj,...,bn)| = d(A) = A(W). 


It follows that every interior point of W is in V, and hence W = V. Corollary 17 now 
shows that the same lattice 4 would have been obtained if we had restricted attention 
to some other subsequence of {4x}. 

Let a1, ..., dy be any basis of A. We are going to show that, for the sequence { Ax} 
originally given, there exist ay; € A, such that 


aki DA apask > oo (Gi =1,...,n). 


If this is not the case then, for some 7 ¢€ {1,...,m} and some ¢ > 0, there exist 
infinitely many k such that 


|x —a;|| > e forallx € Ax. 


From this subsequence we could as before pick a further subsequence A;, > A. 
Then every y € A is the limit of a sequence y, € Ax,. Taking y = aj, we obtain a 
contradiction. 


It only remains to show that ag, ..., Aen is a basis of Ax for all large k. Since 
lim |det(ax1,..., dkn)| = | det(a1, ..., dn) | 
k-> 00 


= d(A) = A(V) = lim AVE), 
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for all large k we must have 
0 < |det(ax1,..., akn)| < 2A(Vz). 


But if a1, ..., An Were not a basis of Ax for all large k, then for infinitely many k we 
would have 


| det(axi,...,dkn)| = 2d( Ax) = 2A(Vi). 
Proposition 20 has the following counterpart: 


Proposition 21 Let {A,} be a sequence of lattices in IR” and let Vx. be the Voronoi cell 
of Ax. If there exists a lattice A such that A, — A ask — o, and if V is the Voronoi 
cell of A, then Vk — V in the Hausdorff metric as k > ow. 


Proof By hypothesis, there exists a basis b},..., by, of A and a basis by, ..., ben of 
each Ax such that by; — bj ask > oo (j = 1,...,). Choose R > 0 so that the 
fundamental parallelotope of 4 is contained in the ball Br = {x € R”: ||x|| < Ry}. 
Then, for all k > ko, the fundamental parallelotope of Ax is contained in the ball Bor. 
It follows that, for all k > ko, every point of R” is distant at most 2R from some point 
of A, and hence Vz C Bor. 

Consequently, by Blaschke’s selection principle, the sequence {V;} has a subse- 
quence {V;,} which converges in the Hausdorff metric to a compact convex set W. 
Moreover, 


A(W) = lim (Vx) = lim d(Axz,) = d(A) > 0. 
Vv oo v> oo 


Consequently, since W is convex, it has nonempty interior. It now follows from Propo- 
sition 20 that W = V. 

Thus any convergent subsequence of {Vx} has the same limit V. If the whole 
sequence {V;} did not converge to V, there would exist p > 0 and a subsequence 
{ Vx, } such that 


h(Vi,,V) >p_ forall v. 


By the Blaschke selection principle again, this subsequence would itself have a con- 
vergent subsequence. Since its limit must be V, this yields a contradiction. 


Suppose 4, € &%, and Ay — A ask — oo. We will show that not only 
d(Ax) > d(A), but also m(A;x) > m(A) as k > oo. Since every x € A is the limit 
of a sequence x, € Ax, we must have Timy_5 0m (Ax) < m(A). On the other hand, by 
Proposition 19, if x, € Ag and x, > x, then x € A. Hence lim, _,,,m(A) > m(A), 
since x # 0 if x, # 0 for large k. 

Suppose now that a subset .¥ of -Z;, has the property that any infinite sequence A, 
of lattices in ¥ has a convergent subsequence. Then there exist positive constants p, 
o such that 


m(A)>p*, d(A)<o forall Ae F. 
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For otherwise there would exist a sequence A, of lattices in .F such that either 
m(Ax) > 0 or d(Ax) — ov, and clearly this sequence could have no convergent 
subsequence. 

We now prove the fundamental compactness theorem of Mahler (1946), which says 
that this necessary condition on -¥ is also sufficient. 


Proposition 22 If {A;} is a sequence of lattices in R” such that 
m(Ax) > p*, d(Ag) < 0 forallk, 


where p,o are positive constants, then the sequence { Ax} certainly has a convergent 
subsequence. 


Proof Let Vg denote the Voronoi cell of A,. We show first that the ball Bp/2 = {x € 
R” : ||x|| < p/2} is contained in every Voronoi cell V,. In fact if ||x|| < p/2 then, for 
every nonzero y € Ax, 


lx — yll = Ilyll — Ile = p — 9/2 = p/2 > IIx, 


and hence x € Vx. 

Let vg be a point of Vz which is furthest from the origin. Then V; contains the 
convex hull Cx of the set og U By/2. Since the volume of Vj is bounded above by a, so 
also is the volume of C;. But this implies that the sequence vx is bounded. Thus there 
exists R > 0 such that the ball Br contains every Voronoi cell Vx. 

By Blaschke’s selection principle, the sequence {Vx} has a subsequence {Vj;,} 
which converges in the Hausdorff metric to a compact convex set V. Since By/2 C V, 
it follows from Proposition 20 that 4;, — A, where A is a lattice with Voronoi cell V. 


To illustrate the utility of Mahler’s compactness theorem, we now show that, as 
stated in Section 3, any compact symmetric convex set K with nonempty interior has 
a critical lattice. 

By the definition of the critical determinant 4(K), there exists a sequence Ax 
of lattices with no nonzero points in the interior of K such that d(4,) > A(K) as 
k — oo. Since K contains a ball B, with radius p > 0, we have m(Ax) > p* forall k. 
Hence, by Proposition 22, there is a subsequence 4x, which converges to a lattice A 
as v — oo. Since every point of / is a limit of points of 4;,, no nonzero point of A 
lies in the interior of K. Furthermore, 


d(A) = lim d(Az,) = A(K), 


and hence A is a critical lattice for K. 


7 Further Remarks 


The geometry of numbers is treated more extensively in Cassels [11], Erd6s et al. [22] 
and Gruber and Lekkerkerker [27]. Minkowski’s own account is available in [42]. 
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Numerous references to the earlier literature are given in Keller [34]. Lagarias [36] 
gives an overview of lattice theory. For a simple proof that the indicator function of a 
convex set is Riemann integrable, see Szabo [57]. 

Diophantine approximation is studied in Cassels [12], Koksma [35] and 
Schmidt [50]. Minkowski’s result that the discriminant of an algebraic number field 
other than Q has absolute value greater than 1 is proved in Narkiewicz [44], 
for example. 

Minkowski’s theorem on successive minima is proved in Bambah ef al. [3]. For the 
results of Banaszczyk mentioned in §3, see [4] and [5]. Sharp forms of Siegel’s lemma 
are proved not only in Bombieri and Vaaler [7], but also in Matveev [40]. The result of 
Gillet and Soulé appeared in [25]. Some interesting results and conjectures concerning 
the product A(K )A(K*) are described on pp. 425—427 of Schneider [51]. 

An algorithm of Lovasz, which first appeared in Lenstra, Lenstra and Lovasz [38], 
produces in finitely many steps a basis for a lattice A in R” which is ‘reduced’. 
Although the first vector of a reduced basis is in general not a minimal vector, it has 
square-norm at most 2”~!m(A). This suffices for many applications and the algorithm 
has been used to solve a number of apparently unrelated computational problems, 
such as factoring polynomials in Q[f], integer linear programming and simultaneous 
Diophantine approximation. There is an account of the basis reduction algorithm in 
Schrijver [52]. The algorithmic geometry of numbers is surveyed in Kannan [33]. 

Mahler [39] has established an analogue of the geometry of numbers for formal 
Laurent series with coefficients from an arbitrary field F, the roles of Z,Q and R 
being taken by F[t], F(t) and F((t)). In particular, Eichler [19] has shown that the 
Riemann-Roch theorem for algebraic functions may be thus derived by geometry of 
numbers arguments. 

There is also a generalization of Minkowski’s lattice point theorem to locally com- 
pact groups, with Haar measure taking the place of volume; see Chapter 2 (Lemma 1) 
of Weil [60]. 

Voronoi diagrams and their uses are surveyed in Aurenhammer [1]. Proofs of the 
basic properties of polytopes referred to in §4 may be found in Brgndsted [9] and 
Coppel [15]. Planar tilings are studied in detail in Grinbaum and Shephard [28]. 

Mathematical crystallography is treated in Schwarzenberger [53] and Engel [21]. 
For the physicist’s point of view, see Burckhardt [10], Janssen [32] and Birman [6]. 
There is much theoretical information, in addition to tables, in [31]. 

For Bieberbach’s theorems, see Vince [59], Charlap [13] and Milnor [41]. 
Various equivalent forms for the definitions of crystal and crystallographic group 
are given in Dolbilin et al. [17]. It is shown in Charlap [13] that crystallographic 
groups may be abstractly characterized as groups containing a finitely generated max- 
imal abelian torsion-free subgroup of finite index. (An abelian group is torsion-free 
if only the identity element has finite order.) The fundamental group of a compact 
flat Riemannian manifold is a torsion-free crystallographic group and all torsion- 
free crystallographic groups may be obtained in this way. For these connections with 
differential geometry, see Wolf [61] and Charlap [13]. 

In more than 4 dimensions the complete enumeration of all crystallographic groups 
is no longer practicable. However, algorithms for deciding if two crystallographic 
groups are equivalent in some sense have been developed by Opgenorth ef al. [45]. 
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An interesting subset of all crystallographic groups consists of those generated by 
reflections in hyperplanes, since Stiefel (1941/2) showed that they are in 1-1 corre- 
spondence with the compact simply-connected semi-simple Lie groups. See the ‘Note 
historique’ in Bourbaki [8]. 

There has recently been considerable interest in tilings of R” which, although not 
lattice tilings, consist of translates of finitely many n-dimensional polytopes. The first 
example, in R*, due to Penrose (1974), was explained more algebraically by de Bruijn 
(1981). A substantial generalization of de Bruijn’s construction was given by Katz 
and Duneau (1986), who showed that many such ‘quasiperiodic’ tilings may be ob- 
tained by a method of cut and projection from ordinary lattices in a higher-dimensional 
space. The subject gained practical significance with the discovery by Shechtman et al. 
(1984) that the diffraction pattern of an alloy of aluminium and magnesium has icosa- 
hedral symmetry, which is impossible for a crystal. Many other ‘quasicrystals’ have 
since been found. The papers referred to are reproduced, with others, in Steinhardt and 
Ostlund [56]. The mathematical theory of quasicrystals is surveyed in Le et al. [37]. 

Skubenko [54] has given an upper bound for Hermite’s constant y,. Somewhat 
sharper bounds are known, but they have the same asymptotic behaviour and the proofs 
are much more complicated. A lower bound for y, was obtained with a new method 
by Ball [2]. 

For the densest lattices in R”(n < 8), see Ryshkov and Baranovskii [49]. The 
enumeration of all root lattices is carried out in Ebeling [18]. (A more general prob- 
lem is treated in Chap. 3 of Humphreys [30] and in Chap. 6 of Bourbaki [8].) For the 
Voronoi cells of root lattices, see Chap. 21 of Conway and Sloane [14] and Moody and 
Patera [43]. For the Dynkin diagrams associated with root lattices, see also Reiten [47]. 

Rajan and Shende [46] characterize root lattices as those lattices for which every 
facet vector is a minimal vector, but their definition of root lattice is not that adopted 
here. Their argument shows that if every facet vector of a lattice is a minimal vector 
then, after scaling to make the minimal vectors have square-norm 2, it is a root lattice 
in our sense. 

There is a fund of information about lattice packings of balls in Conway and 
Sloane [14]. See also Thompson [58] for the Leech lattice and Coxeter [16] for the 
kissing number problem. 

We have restricted attention to lattice packings and, in particular, to lattice pack- 
ings of balls. Lattice packings of other convex bodies are discussed in the books on 
geometry of numbers cited above. Non-lattice packings have also been much studied. 
The notion of density is not so intuitive in this case and it should be realized that the 
density is unaltered if finitely many sets are removed from the packing. 

Packings and coverings are discussed in the texts of Rogers [48] and 
Fejes Toth [23], [24]. For packings of balls, see also Zong [62]. Sloane [55] and 
Elkies [20] provide introductions to the connections between lattice packings of balls 
and coding theory. 

The third part of Hilbert’s 18th problem, which is surveyed in Milnor [41], deals 
with the densest lattice or non-lattice packing of balls in R”. It is known that, for 
n = 2, the densest lattice packing is also a densest packing. The original proof by 
Thue (1882/1910) was incomplete, but a complete proof was given by L. Fejes Toth 
(1940). The famous Kepler conjecture asserts that, also for n = 3, the densest lattice 
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packing is a densest packing. A computer-aided proof has recently been announced by 
Hales [29]. It is unknown if the same holds for any n > 3. 


Propositions 20 and 21 are due to Groemer [26], and are of interest quite apart from 


the application to Mahler’s compactness theorem. Other proofs of the latter are given 
in Cassels [11] and Gruber and Lekkerkerker [27]. Blaschke’s selection principle and 
Radstrom’s cancellation law are proved in [15] and [51], for example. 
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Ix 


The Number of Prime Numbers 


1 Finding the Problem 


It was already shown in Euclid’s Elements (Book IX, Proposition 20) that there are 
infinitely many prime numbers. The proof is a model of simplicity: let p1,..., pn be 
any finite set of primes and consider the integer N = p,--- pn +1.Then N > | and 
each prime divisor p of N is distinct from pj,..., Pn, since p = p; would imply that 
p divides N — p,--- py = 1. It is worth noting that the same argument applies if we 
take N = p{''--- pn" + 1, with any positive integers a1,..., Gn. 

Euler (1737) gave an analytic proof of Euclid’s result, which provides also quanti- 
tative information about the distribution of primes: 


Proposition 1 The series >° . 1/p, where p runs through all primes, is divergent. 
Proof For any prime p we have 

(l-I/py' =1+ p+ po t+: 
and hence 


[[@-1/p)' = [Jate'+p7t---)> doi/a, 


p<x p<x n<x 
since any positive integer n < x is a product of powers of primes p < x. Since 
n+l 
Dl/n> > dt/t > logx, 
n<x n<x vn 


it follows that 


[[G -1/p)T > loge. 


psx 


On the other hand, since the representation of any positive integer as a product of 
prime powers is unique, 


[[G-1/e?y! = [[d4+p74+p744+--) 5 ov’ =s, 
n=1 


psx PpSx 
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and 


oo oo n+1 co 
S=1+ > 1/@+l’ < +> / arje=i+ f dt/t? =2. 


n=1 n=l?" : 


(In fact S = 27/6, as Euler (1735) also showed.) Since 1—1/p? = (1—1/p)(1+1/p), 
and since 1+ x < e’, it follows that 

[]G=-t/py! < ST] Gt tp) < 8 eXrsr/?, 

p<x PpSx 
Combining this with the inequality of the previous paragraph, we obtain 


ys 1/p > loglogx — log S. 


psx 


Since the series }°°_, 1/ n? is convergent, Proposition | says that ‘there are more 
primes than squares’. Proposition | can be made more precise. It was shown by 
Mertens (1874) that 


> 1/p = loglogx + ¢ + O(1/logx), 


PSX 


where c is a constant (c = 0.261497...). 
Let z (x) denote the number of primes < x: 


a(x) = > 1. 


psx 


It may be asked whether z (x) has some simple asymptotic behaviour as x — oo. It 
is not obvious that this is a sensible question. The behaviour of z (x) for small values 
of x is quite irregular. Moreover the sequence of positive integers contains arbitrarily 
large blocks without primes; for example, none of the integers 


n!i+2,n!4+3,...,n!+n 


is a prime. Indeed Euler (1751) expressed the view that “there reigns neither order nor 
rule” in the sequence of prime numbers. 

From an analysis of tables of primes Legendre (1798) was led to conjecture that, 
for large values of x, z(x) is given approximately by the formula 


x/(Alogx — B), 


where A, B are constants and log x again denotes the natural logarithm of x (i.e., to 
the base e). In 1808 he proposed the specific values A = 1, B = 1.08366. 

The first significant results on the asymptotic behaviour of z (x) were obtained by 
Chebyshev (1849). He proved that, for each positive integer n, 


lim (<«- ar/ tog) log” x/x <0 
2 


Xx 0O 


Xx 
< lim (««- [ ar/tog1) log” x/x, 
6° 2 


x= 
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where log” x = (logx)”. By repeatedly integrating by parts it may be seen that, for 
each positive integer n, 


x 
Jf aefiogt = (1+ 1Y/ logs + 21/og? x + +--+ (v= 1)t/ log"! x}x/ logs 
2 
x 
+n [ dt/log”*' t + cn, 
2 


where c, is a constant. Moreover, using the Landau order symbol defined under 
‘Notations’, 


x 
| dt/log"*! t = O(x/log"*! x), 
2 


since 


x12 


x 
i dtflog" 4 <a°" flag" 2, / dt/log"t! t < 2"t!x/log"t! x. 
2 x1/2 


Thus Chebyshev’s result shows that A = B = | are the best possible values for a 
formula of Legendre’s type and suggests that 


Li(x) = [ arrtos: 


is a better approximation to z (x). 
If we interpret this approximation as an asymptotic formula, then it implies that 
m(x)logx/x 4 1 asx > oo, i.e., using another Landau order symbol, 


a(x) ~ x/logx. (1) 
The validity of the relation (1) is now known as the prime number theorem. If the n-th 
prime is denoted by p,, then the prime number theorem can also be stated in the form 
Pn ~ nlogn: 
Proposition 2 z(x) ~ x/logx ifand only if pn ~ nlogn. 
Proof If z(x)logx/x — 1, then 

log z(x) + loglogx — logx > 0 
and hence 
log a(x)/logx > 1. 
Consequently 
a(x) log a(x)/x = a(x) logx/x -loga(x)/logx > 1. 


Since z(pn) =n, this shows that py, ~ nlogn. 
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Conversely, suppose p,/nlogn — 1. Since 
(n+ 1) login + 1)/nlogn = (1 + 1/n){1 + log(1 + 1/n)/logn} > 1, 
it follows that py+1/Pn — 1. Furthermore 
log pn — logn — loglogn > 0, 
and hence 
log Pn/logn > 1. 
If pn < xX < Pst, then a(x) =n and 
nlog Pn/Pn+i < a(x) logx/x < nlog pn41/Pn- 
Since 


n log Pn/Pa+1 = Pn/ Pati: Nlogn/ pn: log pn/logn > 1 


and similarly n log pn+1/Pn — 1, it follows that also z (x) logx/x > 1. 


Numerical evidence, both for the prime number theorem and for the fact that Li (x) 
is a better approximation than x /logx to z(x), is provided by Table 1. 

In a second paper Chebyshev (1852) made some progress towards proving the 
prime number theorem by showing that 


a< lim z(x)logx/x < lim 2(x)logx/x < 6a/5, 
X00 a ee 
where a = 0.92129. He used his results to give the first proof of Bertrand’s postulate: 
for every real x > 1, there is a prime between x and 2x. 
New ideas were introduced by Riemann (1859), who linked the asymptotic behav- 
iour of z (x) with the behaviour of the function 


6) = Doin’ 


n=1 


Table 1. 
x m (x) x/logx Li(x) m (x) log x/x a(x)/Li(x) 
10° 168 144. 177. 1.16 0.94 
104 1 229 1 085. 1245. 1.132 0.987 
10° 9 592 8 685. 9629. 1.1043 0.9961 
10° 78 498 72 382. 78 627. 1.08449 0.99835 


107 664579 620420. 664917. 1.07117 0.99949 
108 = 55.761 455. 5428 681. 5762208. 1.06130 0.99987 
10? 50847534 48254942. 50849234. 1.05373 0.999966 
10! 455 052 511 434 294 481. 455055614. 1.04780 0.999993 
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for complex values of s. By developing these ideas, and by showing especially that 
f(s) has no zeros on the line #s = 1, Hadamard and de la Vallée Poussin proved 
the prime number theorem (independently) in 1896. Shortly afterwards de la Vallée 
Poussin (1899) confirmed that Li (x) was a better approximation than x / log x to z(x) 
by proving (in particular) that 


a(x) = Li(x) + O(x/log* x) for every a > 0. (2) 


Better error bounds than de la Vallée Poussin’s have since been obtained, but they still 
fall far short of what is believed to be true. 

Another approach to the prime number theorem was found by Wiener (1927— 
1933), as an application of his general theory of Tauberian theorems. A convenient 
form for this application was given by Ikehara (1931), and Bochner (1933) showed 
that in this case Wiener’s general theory could be avoided. 

It came as a great surprise to the mathematical community when in 1949 Selberg, 
assisted by Erdés, found a new proof of the prime number theorem which uses only 
the simplest facts of real analysis. Though elementary in a technical sense, this proof 
was still quite complicated. As a result of several subsequent simplifications it can 
now be given quite a clear and simple form. Nevertheless the Wiener—Ikehara proof 
will be presented here on account of its greater versatility. The error bound (2) can be 
obtained by both the Wiener and Selberg approaches, in the latter case at the cost of 
considerable complication. 


2 Chebyshev’s Functions 


In his second paper Chebyshev introduced two functions 


A(x) = Di logp, w(x) = >) logp, 


psx ptax 


which have since played a major role. Although w(x) has the most complicated 

definition, it is easier to treat analytically than either O(x) or z(x). As we will show, 

the asymptotic behaviour of @(x) is essentially the same as that of y(x), and the 

asymptotic behaviour of z (x) may be deduced without difficulty from that of O(x). 
Evidently 


A(x) = w(x) =0  forx <2 
and 
0 <O(x) < w(x) ~forx > 2. 


Lemma 3 The asymptotic behaviours of w(x) and @(x) are connected by 


(i) w(x) — A(x) = O(a'/? log? x); 
(ii) w(x) = O(x) if and only if @(x) = O(x), and in this case w(x) — O(x) = 
O(x!/? log x). 
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Proof Since 


y(x) = >" log p + >) logp +--- 


iat pr<x 
and k > log x/log2 implies x!/* < 2, we have 
w(x) =O(x) + 6(x'/?) seers A(xl/™), 


where m = |logx/log2]. (As is now usual, we denote by Ly] the greatest integer 
< y.) But it is obvious from the definition of O(x) that O(x) = O(x logx). Hence 


w(x) -O(x) = o( ¥ x! og.) — O(x'/? log x). 
2<k<m 


If 0(x) = O(x) the same argument yields y(x) — 0(x) = O(x!/? log x) and thus 
y(x) = O(x). It is trivial that y(x) = O(x) implies 6(x) = O(x). 


The proof of Lemma 3 shows also that 
w(x) = (x) + (8) + O(e'* log? x). 
Lemma 4 w(x) = O(x) ifand only if x(x) = O(x/logx), and then 
m(x)logx/x = w(x)/x + O(1/logx). 


Proof Although their use can easily be avoided, it is more suggestive to use Stieltjes 
integrals. Suppose first that y(x) = O(x). For any x > 2 we have 


X+ 
a(x) = | 1/logt dO(t) 
a 
and hence, on integrating by parts, 
x 
a(x) = O(x)/logx +f O(t)/t log? t dt. 
2 
But 
x 
| O(t)/t log” tdt = O(x/ log’ x), 
2 
since 0(t) = O(t) and, as we saw in §1, 
x 
| dt/ log” t = O(x/ log’ x). 
2 


Since 


A(x)/logx = w(x)/logx + Ox"), 
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by Lemma 3, it follows that 
a(x) = w(x)/logx + O(x/log? x). 
Suppose next that z(x) = O(x/logx). For any x > 2 we have 


x+ 
ay = | log t dz(t) 
= 


= 1(x)logx =f meyeat = O(x), 
2 


and hence also w(x) = O(x), by Lemma 3. 


It follows at once from Lemma 4 that the prime number theorem, a(x) ~ x/logx, 
is equivalent to w(x) ~ x. 
The method of argument used in Lemma 4 can be carried further. Put 


O(x)=x+RX), wzaX)= i. dt/logt + Q(x). 
2 


Subtracting 
x x 
| dt/logt = x/logx — 2/Iog2-+ f dt/ log’ t 
2 2 
from 
x 
a(x) = O(x)/logx +f O(t)/t log’ t dt, 
2 
we obtain 


O(x) = R(x)/log + | R(t)/t log? t dt +2/log2. (3), 
2 


Tt. du/logu )dt/t = ioe ar/t)du/ tog 


x 
= (log x — logu)du/logu 
2 


Also, adding 


= tog [ dt/logt —x +2 
2 
to 
a(x) = m(x)logx ~ [ m(t)/t dt 
2 


we obtain 


R(x) = Q(x) log x -{ O(t)/tdt — 2. (3)2 
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It follows from (3);—(3)2 that R(x) = O(x/log® x) for some a > 0 if and only if 
Q(x) = O(x/log*t! x). Consequently, by Lemma 3, 


w(x) =x + O(x/log* x) for every a > 0 
if and only if 
x 
a(x) = : dt/logt + O(x/log* x) for every a > 0, 
2 
and z (x) then has the asymptotic expansion 


a(x) ~ {1+ 1!/logx + 2!/log” x +.---}x/logx, 


the error in breaking off the series after any finite number of terms having the order of 
magnitude of the first term omitted. 
It follows from (3)j—(3)2 also that, for a given a such that 1/2 <a <1, 


w(x) = x + O(x" log’ x), 
if and only if 


x 
a(x) = | dt/logt + O(x* log x). 
2 
The definition of y(x) can be put in the form 


y(x) = >) AQ), 


nox 
where the von Mangoldt function A(n) is defined by 


A(n) = log p ifn = p® for some prime p and some a > 0, 


= 0 otherwise. 


For any positive integer n we have 


logn = > A(d), (4) 
d|n 
since ifn = p .-+ p¢® is the factorization of n into powers of distinct primes, then 


AY 
logn = Sa; log pj. 
j=l 


3 Proof of the Prime Number Theorem 


The Riemann zeta-function is defined by 


(=> ir. (5) 
n=1 
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This infinite series had already been considered by Euler, Dirichlet and Chebyshev, 
but Riemann was the first to study it for complex values of s. As customary, we write 
Ss =o +it, where o and t are real, and n° is defined for complex values of s by 


=s —slogn 


n*=e =n ° (cos(t logn) — i sin(t logn)). 


To show that the series (5) converges in the half-plane o > 1 we compare as 
in §1 the sum with an integral. If |x| denotes again the greatest integer < x, then on 
integrating by parts we obtain 


N N N+ 
“Sd _ = 8: = Sd —_ 
| x" dx > n I x {x — |x]} 


n=1 
N 
=-| +s #7 ee = [x hee. 
1 


Since 


N 
/ x Sdx =(1—N!)/(s — 1), 
1 


by letting N — oo we see that ¢(s) is defined foro > 1 and 


¢(s) =1/(s—1)+1 =s [oe [x |} dx. 
1 


But, since x — |x| is bounded, the integral on the right is uniformly convergent in any 
half-plane o > 6 > O. It follows that the definition of ¢(s) can be extended to the half- 
plane o > 0, so that it is holomorphic there except for a simple pole with residue | at 
s=1, 

The connection between the zeta-function and prime numbers is provided by 
Euler’s product formula, which may be viewed as an analytic version of the funda- 
mental theorem of arithmetic: 


Proposition 5 ¢(s) = ie — p~*)7! foro > 1, where the product is taken over all 
primes p. 


Proof Fora > 0 we have 
(apy Slap ep es 


Since each positive integer can be uniquely expressed as a product of powers of distinct 


primes, it follows that 
[]a-p)t= >, 


psx n<Nx 


where JN, is the set of all positive integers, including 1, whose prime factors are all 
< x. But N, contains all positive integers < x. Hence 


< » tas foro > 1, 


n>X 


eo) -[[a-py'! 


psx 


and the sum on the right tends to zero as x > oo. 
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It follows at once from Proposition 5 that ¢(s) 4 0 fora > 1, since the infinite 
product is convergent and each factor is nonzero. 


Proposition 6 —¢’(s)/¢(s) = “V2, A(n)/n° for o > 1, where A(n) denotes von 
Mangoldt’s function. 


Proof The series w(s) = >{°°., A(n)n~* converges absolutely and uniformly in any 
half-plane o > 1 + ¢, where ¢ > 0, since 


0 < A(n) < logn < n°’? forall large n. 


Hence 


C(s)o(s) = >> m™ S* A(R 
k=1 


m=1 


=> 2" > Ad. 


n=1 d|n 


Since pane A(d) = logn, by (4), it follows that 


C(s)@(s) = sx logn = —¢'(s). 


n=1 


Since ¢(s) #0 foro > 1, the result follows. However, we can also prove directly that 
f(s) 4 Oforo > 1, and thus make the proof of the prime number theorem independent 
of Proposition 5. 

Obviously if ¢(so9) = 0 for some so with Zso > 1 then (so) = 0, and it follows 
by induction from Leibniz’ formula for derivatives of a product that ¢) (so) = 0 for 
all n > 0. Since ¢(s) is holomorphic for o > 1 and not identically zero, this is a 
contradiction. 


Proposition 6 may be restated in terms of Chebyshev’s y-function: 


—¢'(s)/C(s) = [raven = - e “dw(e’) ford > 1. (6) 
1 0 


We are going to deduce from (6) that the function ¢(s) has no zeros on the line Zs = 1. 
Actually we will prove a more general result: 


Proposition 7 Let f(s) be holomorphic in the closed half-plane &s > 1, except fora 
simple pole ats = 1. If, for Bs > 1, f(s) #0 and 


[o.@) 
-F OIG) = fe" aa0e) 
where $(x) is a nondecreasing function for x > 0, then 


fUd+it) #0 forevery realt £0. 
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Proof Puts =o +it, where o and ¢ are real, and let 
g(o,t) = —Al f'(s)/f(s)}. 
Thus 


g(a, th= [ e °*cos(tx)dd(x) fore > 1. 


Hence, by Schwarz’s inequality (Chapter I, 810), 
[o.@) [o.@) 
alot sf ee apta) [ e°% cos*(tx) d0(x) 
0 0 


= (0,0) a e~7* {1 + cos(2tx)} dd(x)/2 
0 
= g(¢,0){g(c, 0) + g(a, 21)}/2. 


Since f(s) has a simple pole at s = 1, by comparing the Laurent series of f(s) and 
f(s) at s = 1 (see Chapter I, §5) we see that 


(o —1)g(o,0) > 1 aso 53 14. 


Similarly if f(s) has a zero of multiplicity m(t) > 0 at 1 + it, where t 4 0, then by 
comparing the Taylor series of f(s) and f’(s) at s = 1 +it we see that 


(o — 1)g(o,t) > -—m(t) aso 3 14. 
Thus if we multiply the inequality for g(a, t)? by (o — 1)? and let o > 1+, we obtain 


m(t)? < {1 — m(2t)}/2 < 1/2. 


Therefore, since m(t) is an integer, m(t) = 0. 


For f(s) = ¢(s), Proposition 7 gives the result of Hadamard and de la Vallée 
Poussin: 


Corollary 8 ¢(1 + it) 4 0 for every real t 4 0. 


The use of Schwarz’s inequality to prove Corollary 8 seems more natural than the 
usual proof by means of the inequality 3 + 4cos@ + cos2@ > 0. It follows from 
Corollary 8 that —¢’(s)/¢(s) — 1/(s — 1) is holomorphic in the closed half-plane 
o > 1. Hence, by (6), the hypotheses of the following theorem, due to Ikehara (1931), 
are satisfied with 


F(s)=—C(8)/C(s), @@) = ye"), A=A=1. 


Theorem 9 Let $(x) be a nondecreasing function for x > 0 such that the Laplace 
transform 


re= | e **dd(x) 
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is defined for #s > h, where h > 0. If there exists a constant A and a function G(s), 
which is continuous in the closed half-plane 2s > h, such that 


G(s) = F(s)-—Ah/(s —h) for &s > h, 
then 
d(x) ~ Ae"* for x — +00. 


Proof For each X > 0 we have 


xX x 
[erage =X) - oO +5 | MH) — HO) de. 


For real s = p > h both terms on the right are nonnegative and the integral on the left 
has a finite limit as X — oo. Hence e~?*¢(X) is a bounded function of X for each 
p > h. It follows that if Zs > h we can let X — oo in the last displayed equation, 
obtaining 


F(s) = sf e “{h(x) -—d(O)}dx for ¥s > h. 
0 
Hence 
[G(s) — A]/s = F(s)/s — A/(s —h) = is * eG a(x) — A} dx, 
0 


where a(x) = e~/*{d(x) — &(0)}. Thus we will prove the theorem if we prove the 
following statement: 
Let a(x) be a nonnegative function for x > 0 such that 


[o.@) 
gs) = | e™{a(x) — A}dx, 
0 
where s = o +it, is defined for every o > 0 and the limit 
y(t) = lim g(s) 
o—+0 


exists uniformly on any finite interval —T < t < T. If, for some h > 0, e!*a(x) is a 
nondecreasing function, then 


lim a(x) =A. 
x00 


In the proof of this statement we will use the fact that the Fourier transform 


ku) = / * el k(t)dt 
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of the function 
k(t) = 1—|t| for |t| < 1, =0 for |t| > 1, 


has the properties 


Co 
k(u)>0 for—co<u<o, c= | k(u)du < oo. 


—co 


Indeed 
aA 1 . 
k(u) = | e“'(1 — |t|) dt 
-1 


1 
=2/ (1 —t)cosut dt 
0 
= 2(1 —cosu)/u?. 


Let ¢, 2, y be arbitrary positive numbers. If s = ¢ + it, then 


1 1 a0 
if e"Yk(t)9(s) dt = if nce | e742 fa (x) — A} dxdt 
=] 0 


[ee) 1 
J i a *{a(x) — A} l el O-Y k(t) dtdx 
0 -l 


[o.@) 
if e ““a(x)k(A(y — x)) dx 
0 
_ A 
- ia | e ““*k(Q(y —x)) dx. 
0 
When ¢ > +0 the left side has the limit 
1 4 
XQ) := af e!“'Yk(t)y (At) dt 
-1 
and the second term on the right has the limit 
ee A 
ia | k(A(y — x)) dx. 
0 


Consequently the first term on the right also has a finite limit. It follows that 


if” alx)k(A(y —x))dx 
0 


is finite and is the limit of the first term on the right. Thus 
&9 aA 
x0) =f tale) — A‘kaU — 29) dx 
0 


ay 
_ / {a(y —v/A) — A}k(v) do. 
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By the ‘Riemann—Lebesgue lemma’, y(y) — 0 as y > ow. In fact this may be 
proved in the following way. We have 


oo 4 
xoy= f eowar 
=—co 
where 


@(t) = Ak(t)y (At). 


Changing the variable of integration to t + z/Ay, we obtain 


XQ) = -{ eV en(t + t/dy) dt. 


Hence 


ONS / eft) — ot + a/dy)} dt 


and 


alxG) < | nth etree de 


Since w(t) is continuous and vanishes outside a finite interval, it follows that 
x(y) > Oasy > ~w. 
Since 


ay. 
/ k(v)dv > C ayou, 


—cCo 


we deduce that 


dy mn 
lim = a(y —v/A)k(v)dv = AC_ forevery A > 0. 
—0o 


yoo 


We now make use of the fact that e”* a(x) is a nondecreasing function. Choose any 
6 € (0, 1). If y = x + 6, where x > 0, then for |v| < Ad 


a(y = v/A) > eh 0-0/Na (x) > e753 g (x) 


and hence 


dy . 1d. 
/ a(y — v/A)k(v) dv > cate) | k(v) do. 
oe) —16 


We can choose A = A(0) so large that the integral on the right exceeds (1 — 6)C. Then, 
letting x > oo we obtain 


AC > e~79(1 — 6)C Tim a(x). 
x 0o 
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Since this holds for arbitrarily small 6 > 0, it follows that 

lim a(x) <A. 

x00 
Thus there exists a positive constant M such that 

0 < a(x) <M forallx >0. 
On the other hand, if y = x — 6, where x > 0, then for |v| < 16 
aly —v/Ad) < eh @+*/Na(x) < ce a(x) 

and hence 


hy . rome : 
/ aly — v/A)k(v) dv < ea (x) k(v)dv +M k(v) do. 
lore) —16 


|v]2=A06 


We can choose 2 = (0) so large that the second term on the right is less than dC. 
Then, letting x — oo we obtain 


AC < eC lim a(x) + 6C. 


x7 CO 


Since this holds for arbitrarily small 6 > 0, it follows that 


A < lim a(x). 
xX—0O 


Combining this with the inequality of the previous paragraph, we conclude that 
lim, 300 a(x) = A. 


Applying Theorem 9 to the special case mentioned before the statement of the 
theorem, we obtain y(e*) ~ e*. As we have already seen in §2, this is equivalent to 
the prime number theorem. 


4 The Riemann Hypothesis 


In his celebrated paper on the distribution of prime numbers Riemann (1859) proved 
only two results. He showed that the definition of ¢(s) can be extended to the 
whole complex plane, so that ¢(s) — 1/(s — 1) is everywhere holomorphic, and he 
proved that the values of ¢(s) and ¢(1 — s) are connected by a certain functional 
equation. This functional equation will now be derived by one of the two methods 
which Riemann himself used. It is based on a remarkable identity which Jacobi (1829) 
used in his treatise on elliptic functions. 


Proposition 10 For any t, y € R with y > 0, 


oo oo 
>” e ttnyay — y-l/2 > eT a/y p2uint (7) 


n=—OoO n=—OoO 


In particular, 


co 


fore) 
> ent ny — yl/2 > ee aly. (8) 


n>=—OO n=—OO 
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Proof Put f(v) = e~”'7Y and let 


g(u) -_ i. f (ve do 


be the Fourier transform of f(v). We are going to show that 


dD fotn= S gine. 


n=—oOo n=—OoO 


Let 


Fo)= >) fotn). 


n=—OoO 


This infinite series is uniformly convergent for 0 < v < 1, and so also is the series 
obtained by term by term differentiation. Hence F(v) is a continuously differentiable 
function. Consequently, since it is periodic with period 1, it is the sum of its own 
Fourier series: 


co 


F(v) = » Cnet, 


m>=—Oo 


where 


1 
Cm =) F(v)e—27"” do. 
0 


We can evaluate c,, by term by term integration: 


a 1 : ae n+] : 
Gs > if fv Ae nje~27*™ dp = > | f (v)e~27” do 
0 2 


n=—00 n=—oo%! 


J foveteimao = gto. 


The argument up to this point is an instance of Poisson’s summation formula. To 


‘ oe : : . ; 
evaluate g(u) in the case f(v) = e~” 7” we differentiate with respect to u and integrate 
by parts, obtaining 


oo ; ; 
g(u) = —2ni f er pe ds 


—oo 
oe) 
=| e 
—00 
oo 2 . 
= -Giy) f eo! FY dea 2tiuy 
—0o 


= —(2nu/y)g(u). 


—2riuv g —v2ry 
e 
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The solution of this first order linear differential equation is 


g(u) = ge". 


Moreover 
2. 
g(0) =y e? dp = (ry) '/77, 


where 


Thus we have proved that 


co 


fore 
>» etn) ry — (xy) 7/77 >> eT a/y e2aine 


n=—oo n=—OO 


Substituting » = 0, y = 1, we obtain J = 1/2. 


The theta function 


I@)= > eo" @>0) 


n=—OoO 


arises not only in the theory of elliptic functions, as we will see in Chapter XII, but 
also in problems of heat conduction and statistical mechanics. The transformation law 


(x) = x7!/70(1/x) 


is very useful for computational purposes since, when x is small, the series for 7 (x) 
converges extremely slowly but the series for 0 (1/x) converges extremely rapidly. 

Since the functional equation of Riemann’s zeta function involves Euler’s gamma 
function, we summarize here the main properties of the latter. Euler (1729) defined his 
function /"(z) by 


= ih saree 1% 
1/P@) = lim ze+ 1+ G+n)/nin’, 
where n* = exp(zlogn) and the limit exists for every z € C. It follows from the 


definition that 1/J"(z) is everywhere holomorphic and that its only zeros are simple 
zeros at the points z = 0, —1, —2,.... Moreover /’(1) = 1 and 


P(z+1)=zl(Z). 


Hence /'(n+1) = n! for any positive integer n. By putting /'(z+1) = z! the definition 
of the factorial function may be extended to any z € C which is not a negative integer. 
Wielandt (1939) has characterized /"(z) as the only solution of the functional equation 


F@+1)=zF() 
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with F(1) = 1 which is holomorphic in the half-plane @z > 0 and bounded for 
1< &z <2. 
It follows from the definition of /°(z) and the product formula for the sine function 
that 
T(z)l G0 -z)=a/sinzz. 


Many definite integrals may be evaluated in terms of the gamma function. By repeated 
integration by parts it may be seen that, if Zz > 0 andn €N, then 


n 
nin®/z(z + 1)---(¢ +n) = | (1 —t/n)"t2—'dt, 
0 
where t?—! = exp{(z— 1) log?}. Letting n > 00, we obtain the integral representation 
[0.0] 
rQ@)= i e't'dt for Bz > 0. (9) 
0 


It follows that (1/2) = z!/?, since 


Co [oe) 2 
| lee mes: -| e? dv=nr!, 
0 —oo 


by the proof of Proposition 10. It was already shown by Euler (1730) that 


1 
Bw, y) = [ ra = 1dr = FO)PO)/Te +9), 


the relation holding for #x > 0 and &y > O. The unit ball in R” has volume 
Ky i= 2"/?/(n/2)! and surface content nk,. Stirling’s formula, n! ~ (n/e)"/2nn, 
follows at once from the integral representation 


log P'(z) = (g — 1/2) logz — z+ (1/2) log2a — fo “i - Tee o a, 
0 


valid for any z € C which is not zero or a negative integer. Euler’s constant 


y = lim (14+ 1/2+1/3+---+1/n — logn) © 0.5772157 
n—- oo 


may also be defined by y = —I'’(1). 
We now return to the Riemann zeta function. 


Proposition 11 The function Z(s) = 2~°/*I'(s/2)¢(s) satisfies the functional equa- 
tion 


Z(s)=ZA-s)for0<o <1. 


Proof From the representation (9) of the gamma function we obtain, fora > O and 
n>1, 
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oo yi 
| gle de ae PP (sn, 
0 


Hence, if o > 1, 


[oe) fore) 5 
Z(s) = > gla | or rt gy 
n=l 0 


= [ x21 b(x)dx, 


0 


where 
d(x) = pl autas 
n=1 
By Proposition 10, 
2b(x) + b= x7!/7[26(1/x) + 1). 
Hence 
[ee) 1 
Z(s) =i x92" B(x) dx +/ gt Vp eae PF = 1d 
1 0 
[e-e) 1 
=| x°/?-lb (x) dx +/ x5/2-3/246,(1 /x) dx +1/(s — 1) — 1/s 
1 0 
= as + x78/2-1/2) 8 (x) dx + 1/s(s — 1). 
1 


The integral on the right is convergent for all s and thus provides the analytic continu- 
ation of Z(s) to the whole plane. Moreover the right side is unchanged if s is replaced 
by l—s. 


The function Z(s) in Proposition 11 is occasionally called the completed zeta 
function. In its product representation 


Zon PT ED =7") 


it makes sense to regard 2~°/?"(s/2) as an Euler factor at oo, complementing the 
Euler factors (1 — p~*)7! at the primes p. 

It follows from Proposition 11 and the previously stated properties of the gamma 
function that the definition of ¢(s) may be extended to the whole complex plane, so 
that ¢(s) — 1/(s — 1) is everywhere holomorphic and ¢(s) = Oif s = —2, —4, -6,.... 
Since ¢(s) 4 0 foro > 1 and ¢(0) = —1/2, the functional equation shows that these 
‘trivial’ zeros of ¢(s) are its only zeros in the half-plane o < 0. Hence all ‘nontrivial’ 
zeros of ¢(s) lie in the strip 0 < o < 1 and are symmetrically situated with respect 
to the line o = 1/2. The famous Riemann hypothesis asserts that all zeros in this strip 
actually lie on the line o = 1/2. 
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Since ¢(s) = ¢(s), the zeros of ¢(s) are also symmetric with respect to the real 
axis. Furthermore ¢(s) has no real zeros in the strip 0 < o < 1, since 


(=) Aro) S01 = 2 SO = 4S fro <1; 


It has been verified by van de Lune ef al. (1986), with the aid of a supercomputer, 
that the 1.5 x 10° zeros of ¢(s) in the rectangle 0 < o < 1,0 < t < T, where 
T = 545439823.215, are all simple and lie on the line o = 1/2. 

The location of the zeros of ¢(s) is intimately connected with the asymptotic 
behaviour of z (x). Let a* denote the least upper bound of the real parts of all zeros 
of ¢(s). Then 1/2 < a* < 1, since it is known that ¢(s) does have zeros in the strip 
0 <o < 1, and the Riemann hypothesis is equivalent to a* = 1/2. It was shown by 
von Koch (1901) that 


w(x) =x + O(x” log? x) 
and hence 
x(x) = Li(x) + O(x* logx). 


(Actually von Koch assumed a* = 1/2, but his argument can be extended without 
difficulty.) It should be noted that these estimates are of interest only if a* < 1. 
On the other hand if, for some a such that 0 < a < 1, 


a(x) = Li(x) + O(x* log x), 
then 
O(x) = x + O(x* log? x). 
By the remark after the proof of Lemma 3, it follows that 
w(x) =x +x!/? + O(x* log’ x) + O(«!? log? x). 


But foro > 1 we have 
-oe = [ tavey=s | voor as 
and hence 
—¢'(s)/¢(s) — s/(s — 1) —s/(s -— 1/2) = sf woo) apg ae 


The integral on the right is uniformly convergentin the half-planeo > ¢+max(a, 1/3), 
for any ¢ > 0, and represents there a holomorphic function. It follows that 
1/2 < a* < max(a, 1/3). Consequently a* < a and w(x) = x + O(x“ log? x). 

Combining this with von Koch’s result, we see that the Riemann hypothesis is 
equivalent to 


m(x) = Li(x) + O(x!/2 log x) 
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and to 
w(x) =x + O(x!/? log” x). 


Since it is still not known if a* < 1, the error terms here are substantially smaller than 
any that have actually been established. 
It has been shown by Cramér (1922) that 


(log x)! | “(@/t—1Pat 


has a finite limit as x — oo if the Riemann hypothesis holds, and is unbounded if it 
does not. Similarly, for each a < 1, 


x 
aaa (y(t) = ir “ai 
2 


is bounded but does not have a finite limit as x — oo if the Riemann hypothesis holds, 
and is unbounded otherwise. 

For all values of x listed in Table | we have z(x) < Li(x), and at one time it 
was conjectured that this inequality holds for all x > 0. However, Littlewood (1914) 
disproved the conjecture by showing that there exists a constant c > 0 such that 


(xn) — Li(xn) > cx}!? log log log x, / log xp 
for some sequence x, — oo and 
(En) — Li(En) < —c&n/” log log log En/ log é, 


for some sequence ¢, — oo. This is a quite remarkable result, since no actual value 
of x is known for which z(x) > Li(x). However, it is known that z(x) > Li(x) for 
some x between 1.398201 x 103!6 and 1.398244 x 10716. 

In this connection it may be noted that Rosser and Schoenfeld (1962) have shown 
that z(x) > x/logx for all x > 17. It had previously been shown by Rosser (1939) 
that pn > nlogn foralln > 1. 

Not content with not being able to prove the Riemann hypothesis, Montgomery 
(1973) has assumed it and made a further conjecture. For given 6 > 0, let Nr (f) be 
the number of zeros 1/2 +iy,1/2+ iy’ of C(s) withO < y’ < y < T such that 


y —y' < 2xB/logT. 


Montgomery’s conjecture is that, for each fixed £ > 0, 
. 2 
Nr(f) ~ (r/2x)ogr | {1 —(sinzu/mu)*}du asT > ~w. 
0 
Goldston (1988) has shown that this is equivalent to 
Th 
/ {w(x +x/T) — w(x) —x/T}°x-7dx ~ (B — 1/2) log? T/T as T > 00, 
1 


for each fixed £ > 1, where w(x) is Chebyshev’s function. 
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In the language of physics Montgomery’s conjecture says that 1 — (sinzu/u)* 
is the pair correlation function of the zeros of ¢(s). Dyson pointed out that this is 
also the pair correlation function of the normalized eigenvalues of a random N x N 
Hermitian matrix in the limit N — oo. A great deal more is known about this so- 
called Gaussian unitary ensemble, which Wigner (1955) used to model the statistical 
properties of the spectra of complex nuclei. For example, if the eigenvalues are nor- 
malized so that the average difference between consecutive eigenvalues is 1, then the 
probability that the difference between an eigenvalue and the least eigenvalue greater 
than it does not exceed £ converges as N > oo to 


[ p(u)du, 


where the density function p(w) can be explicitly specified. 

It has been further conjectured that the spacings of the normalized zeros of the 
zeta-function have the same distribution. To make this precise, let the zeros 1/2 + iy, 
of ¢(s) with y, > 0 be numbered so that 


yisS72S0°°. 


Since it is known that the number of y’s in an interval [T, T + 1] is asymptotic to 
(log T)/2a as T > ov, we put 


Yn = (Yn log yn)/27, 


so that the average difference between consecutive 7, is 1. If 07 = Yn41 — Yn, and if 
vn (f) is the number of 6, < £ withn < N, then the conjecture is that for each 6 > 0 


B 
opin > | p(u)du- as N > ov. 


This nearest neighbour conjecture and the Montgomery pair correlation conjecture 
have been extensively tested by Odlyzko (1987/9) with the aid of a supercomputer. 
There is good agreement between the conjectures and the numerical results. 


5 Generalizations and Analogues 


The prime number theorem may be generalized to any algebraic number field in the 
following way. Let K be an algebraic number field, i.e. a finite extension of the field 
Q of rational numbers. Let R be the ring of all algebraic integers in K, -¥ the set of all 
nonzero ideals of R, and Y the subset of prime ideals. For any A € .%, the quotient 
ring R/A is finite; its cardinality will be denoted by |A| and called the norm of A. 

It may be shown that the Dedekind zeta-function 


cx(s)= > IAI 
Ace FZ 


is defined for Zs > | and that the product formula 
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ck) = JJ a-ier)' 


PEP 


holds in this open half-plane. Furthermore the definition of ¢x(s) may be extended 
so that it is nonzero and holomorphic in the closed half-plane Zs > 1, except for a 
simple pole at s = 1. By applying Ikehara’s theorem we can then obtain the prime 
ideal theorem, which was first proved by Landau (1903): 


TK(x) ~ x/logx, 


where zx (x) denotes the number of prime ideals of R with norm < x. 

It was shown by Hecke (1917) that the definition of the Dedekind zeta-function 
¢x(s) may also be extended so that it is holomorphic in the whole complex plane, 
except for the simple pole at s = 1, and so that, for some constant A > 0 and non- 
negative integers 7}, r2 (which can all be explicitly described in terms of the structure 
of the algebraic number field K), 


Zx(s) = AI(s/2)"P(s)? Cx (s) 
satisfies the functional equation 
Zxk(s) =ZxK(U—S). 
The extended Riemann hypothesis asserts that, for every algebraic number field K, 
Cx(s) #0 for Bs > 1/2. 


The numerical evidence for the extended Riemann hypothesis is favourable, although 
in the nature of things it cannot be tested as extensively as the ordinary Riemann 
hypothesis. The extended Riemann hypothesis implies error bounds for the prime ideal 
theorem of the same order as those which the ordinary Riemann hypothesis implies 
for the prime number theorem. However, it also has many other consequences. We 
mention only two. 

It has been shown by Bach (1990), making precise an earlier result of Ankeny 
(1952), that if the extended Riemann hypothesis holds then, for each prime p, there is a 
quadratic non-residue a mod p with a < 2 log? p. Thus we do not have to search far in 
order to find a quadratic non-residue, or to disprove the extended Riemann hypothesis. 

It will be recalled from Chapter I that if p is a prime and a an integer not divisible 
by p, then a?—! = 1 mod p. For each prime p there exists a primitive root, i.e. an 
integer a such that a‘ # 1 mod p for 1 < k < p—1. It is easily seen that an even 
square is never a primitive root, that an odd square (including 1) is a primitive root 
only for the prime p = 2, and that —1 is a primitive root only for the primes p = 2, 3. 

Assuming the extended Riemann hypothesis, Hooley (1967) has proved a famous 
conjecture of Artin (1927): if the integer a is not a square or —1, then there exist 
infinitely many primes p for which a is a primitive root. Moreover, if Na (x) denotes 
the number of primes p < x for which a is a primitive root, then 


Na(x) ~ Agx/logx forx > o, 
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where A, is a positive constant which can be explicitly described. (The expression for 
Ag which Artin conjectured requires modification in some cases.) 

There are also analogues for function fields of these results for number fields. Let 
K be an arbitrary field. A field of algebraic functions of one variable over K is a field 
L which satisfies the following conditions: 


(i) K CL, 
(ii) L contains an element v which is transcendental over K, i.e. v satisfies no monic 
polynomial equation 


u” + au"! 4.--+a, =0 


with coefficients aj; € K, 
(iti) L is a finite extension of the field K (v) of rational functions of v with coeffients 
from K, i.e. L is finite-dimensional as a vector space over K (v). 


Let R be aring with K C R C L such that x € L\R implies x—! € R. Then the 
set P of alla € R such that a = Oora™! ¢ R isan ideal of R, and actually the unique 
maximal ideal of R. Hence the quotient ring R/P is a field. Since R is the set of all 
x € L such that xP C P, it is uniquely determined by P. The ideal P will be called a 
prime divisor of the field L and R/P its residue field. It may be shown that the residue 
field R/P is a finite extension of (a field isomorphic to) K. 

An arbitrary divisor of the field L is a formal product A = [|p P”” over all prime 
divisors P of L, where the exponents v p are integers only finitely many of which are 
nonzero. The divisor is integral if vp > O for all P. 

The set K’ of all elements of L which satisfy monic polynomial equations with co- 
efficients from K is a subfield containing K, and L is also a field of algebraic functions 
of one variable over K’. It is easily shown that no element of L\R satisfies a monic 
polynomial equation with coefficients from R. Consequently K’ C R and the notion 
of prime divisor is the same whether we consider L to be over K or over K’. Since 
(K')’ = K’, we may assume from the outset that K’ = K. The elements of K will 
then be called constants and the elements of L functions. 

Suppose now that the field of constants K is a finite field F, containing q elements. 
We define the norm N(P) of a prime divisor P to be the cardinality of the associated 
residue field R/P and the norm of an integral divisor A = [|p P?” to be 


N(A) = I] N(P)’?. 
P 


It may be shown that, for each positive integer m, there exist only finitely many prime 
divisors of norm gq”. Moreover, for #s > 1 the zeta-function of L can be defined by 


ci(s) = > N(A)™, 
A 


where the sum is over all integral divisors of L, and then 


cus) =| Ja-NePy*)yt, 
P 


where the product is over all prime divisors of L. 
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This seems quite similar to the number field case, but the function field case is 
actually simpler. RK. Schmidt (1931) deduced from the Riemann—Roch theorem that 
there exists a polynomial p(w) of even degree 2g, with integer coefficients and constant 
term 1, such that 


cus) = pq™)/A-4q)-4q"), 
and that the zeta-function satisfies the functional equation 
g8&—Y5e7 (5) — g&—DA-er 1 = S). 


The non-negative integer g is the genus of the field of algebraic functions. 

The analogue of the Riemann hypothesis, that all zeros of ¢,(s) lie on the line 
&s = 1/2, is equivalent to the statement that all zeros of the polynomial p(w) have 
absolute value g~!/, or that the number N of prime divisors with norm gq satisfies the 
inequality 


IN —(¢+ DI < 2¢q'/?. 


This analogue has been proved by Weil (1948). A simpler proof has been given by 
Bombieri (1974), using ideas of Stepanov (1969). 

The theory of function fields can also be given a geometric formulation. The prime 
divisors of a function field L with field of constants K can be regarded as the points 
of a non-singular projective curve over K, and vice versa. Weil (1949) conjectured 
far-reaching generalizations of the preceding results for curves over a finite field to 
algebraic varieties of higher dimension. 

Let V be a nonsingular projective variety of dimension d, defined by homogeneous 
polynomials with coefficients in Z. For any prime p, let V, be the (possibly singular) 
variety defined by reducing the coefficients mod p and consider the formal power 
series 


Z,(T) := exp (o™ ()t"/n), 


n>1 


where N,,(p) denotes the number of points of V, defined over the finite field Fp». 
Weil conjectured that, if V, is a nonsingular projective variety of dimension d over 
Fp, then 


(i) Zp(T) is a rational function of T, 
(i1) Zp(1/p*T) = pT 2 7) for some integer e, 
(iti) Z»(T) has a factorization of the form 


Zp(T) = Pi(T) +++ Poa-1(T)/Po(T) Pa(T) - ++ Poa(T), 


where Po(T) = 1 — T, Pog(T) = 1— p*T and P\(T) € ZT] (0 < j < 2d), 
(iv) Pj(T) = [Tg — ajeT), where |aje| = pi? for 1 <k < bj, <j < 2d). 
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The Weil conjectures have a topological significance, since the integer e in (ii) is 
the Euler characteristic of the original variety V, regarded as a complex manifold, and 
b; in (av) is its j-th Betti number. 

Conjecture (i) was proved by Dwork (1960). The remaining conjectures were 
proved by Deligne (1974), using ideas of Grothendieck. The most difficult part is 
the proof that |ajx| = p//* (the Riemann hypothesis for varieties over finite fields). 
Deligne’s proof is a major achievement of 20th century mathematics, but unfortunately 
of a different order of difficulty than anything which will be proved here. 

An analogue for function fields of Artin’s primitive root conjecture was already 
proved by Bilharz (1937), assuming the Riemann hypothesis for this case. Function 
fields have been used by Goppa (1981) to construct linear codes. Good codes are 
obtained when the number of prime divisors is large compared to the genus, and this 
can be guaranteed by means of the Riemann ‘hypothesis’. 

Carlitz and Uchiyama (1957) used the Riemann hypothesis for function fields 
to obtain useful estimates for exponential sums in one variable, and Deligne (1977) 
showed that these estimates could be extended to exponential sums in several variables. 
Let F’, be the field of p elements, where p is a prime, and let f € Fp[uj,..., un] be 
a polynomial in n variables of degree d > 1 with coefficients from F,, which is not 
of the form g? — g + b, where b € F, and g € F,[u,..., un]. (This condition is 
certainly satisfied if d < p.) Then 


BS oot if O1.2n)/P 


X1,.-54n€F p 


< (d — ip, 


We mention one more application of the Weil conjectures. Ramanujan’s tau- 
function is defined by 


oe) 


q| |a=¢"* => Ge" 


n=1 n=1 
It was conjectured by Ramanujan (1916), and proved by Mordell (1920), that 


co 


Si r(n)/n° = [[a — t(p)p75 + pil-25y-1 
Pp 


n=1 


where the product is over all primes p. Ramanujan additionally conjectured that 
|t(p)| < 2p!'/? for all p, and Deligne (1968/9) showed that this was a consequence 
of the (at that time unproven) Weil conjectures. 

The prime number theorem also has an interesting analogue in the theory of 
dynamical systems. Let M be a compact Riemannian manifold with negative sectional 
curvatures, and let N(T) denote the number of different (oriented) closed geodesics 
on M of length < T. It was first shown by Margulis (1970) that 


N(T)~e""/hT as T > ©, 


where the positive constant h is the topological entropy of the associated geodesic flow. 
Although much of the detail is specific to the problem, a proof may be given which 
has the same structure as the proof in §3 of the prime number theorem. If P is an 
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arbitrary closed orbit of the geodesic flow and A(P) its least period, one shows that the 
zeta-function 


mo=[[a=e"™)~ 
P 


is nonzero and holomorphic for Zs > h, except for a simple pole at s = h, and then 
applies Ikehara’s theorem. The study of geodesics on a surface of negative curvature 
was initiated by Hadamard (1898), but it is unlikely that he realized there was a 
connection with the prime number theorem which he had proved two years earlier! 


6 Alternative Formulations 
There is an intimate connection between the Dirichlet products considered in 83 of 
Chapter II and Dirichlet series. It is easily seen that if the Dirichlet series 


oe) 


f(s) = Dran)/n’, (8) = Do d(@a)/n’, 


n=l n=1 


are absolutely convergent for Zs > a, then the product h(s) = f(s)g(s) may also be 
represented by an absolutely convergent Dirichlet series for Zs > a: 


oe) 


h(s) = Dictn)/n’, 


n=1 
where c = a * D, Le. 
c(n) = >" a(d)b(n/d) = > a(n/d)b(d). 
d|n dln 


This implies, in particular, that for Zs > 1 


ee) 


7) = Di e@/n*, ols — De) = Dio W)/*"’, 


n=1 n=1 


where as in Chapter III (not as in §5), 


t(n) = >, 1, of= did, 


d|n dln 


denote respectively the number of positive divisors of n and the sum of the positive 
divisors of n. The relation for Euler’s phi-function, 


a(n) = >°t(n/d)g(d), 


d|n 


which was proved in Chapter III, now yields for Zs > 1 
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(oe) 
f(s — 1/e(s) = D) e(n)/n". 
n=l 


From the property by which we defined the Mobius function we obtain also, for 


&s > 1, 


1/E(s) =D a(n)/n. 


n=1 


In view of this relation it is not surprising that the distribution of prime numbers is 
closely connected with the behaviour of the Mobius function. Put 


M(x) = > un). 


n<x 


Since |(n)| < 1, it is obvious that |M(x)| < |x| for x > 0. The next result is not so 
obvious: 


Proposition 12 M(x)/x > Oas x > oo. 


Proof The function f(s) := ¢(s) + 1/(s) is holomorphic for o > 1, except for a 
simple pole with residue | at s = 1. Moreover 


fo) = vu + p(n)}/n* = [4000 foro > 1, 


n=1 


where #(x) = |x| + M(x) is a nondecreasing function. Since 


f(s) =f ewragce’, 


it follows from Ikehara’s Theorem 9 that d(x) ~ x. 


Proposition 12 is equivalent to the prime number theorem in the sense that either of 
the relations M(x) = o(x), w(x) ~ x may be deduced from the other by elementary 
(but not trivial) arguments. 

The Riemann hypothesis also has an equivalent formulation in terms of the func- 
tion M(x). Suppose 


M(x) = O(x*) asx > ~, 


for some a such that 0 < a < 1. Foro > | we have 
ioe) [o-e) 
1/C(s) =| x *dM(x) = sf x °-' M(x) dx. 
1- 1 


But for o > a the integral on the right is convergent and defines a holomorphic func- 
tion. Consequently it is the analytic continuation of 1/¢(s). Thus if a* again denotes 
the least upper bound of all zeros of ¢(s), then a > a* > 1/2. On the other hand, 
Littlewood (1912) showed that 
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M(x) = O(x" +") for every ¢ > 0. 


It follows that the Riemann hypothesis holds if and only if M(x) = O(x“) for every 
a> 1/2. 


It has already been mentioned that the first 1.5 x 10° zeros of ¢(s) on the line 
o = 1/2 are all simple. It is likely that the Riemann hypothesis does not tell the 
whole story and that all zeros of ¢(s) on the line o@ = 1/2 are simple. Thus it is 
of interest that this is guaranteed by a sufficiently sharp bound for M(x). We will 
show that if 


M(x) = O(x'/*log* x) asx > c, 


for some a < 1, then not only do all nontrivial zeros of ¢(s) lie on the line o = 1/2 
but they are all simple. 


Let p = 1/2+ iy bea zero of ¢(s) of multiplicity m > 1 and takes = p+h, 
where h > 0. Theno = 1/2 +h and, since 


(oe) 
1/¢(s) = sf x! M(x) dx foro > 1/2, 
1 
we have 


[1/c(s)| < it [xe tm ayia = ods) [og dy 
1 1 


= ousp | ey%du = Os) (at I /het!, 


Thus h@+!| 1/¢(s)| is bounded for h + +0 and hence m < a+1. Since m is an integer 
anda < 1, this implies m = 1 anda > 0. 

The prime number theorem, in the form M(x) = o(x), says that asymptotically 
u(n) takes the values +1 and —1 with equal probability. By assuming that actually 
the values (1) asymptotically behave like independent random variables Good and 
Churchhouse (1968) have been led to two striking conjectures, analogous to the central 
limit theorem and the law of the iterated logarithm in the theory of probability: 


Conjecture A /f N(n) > oo and log N/logn — 0, then 


M N)—-M ’ 
Py al ce atc <tt> Qn? f edu, 
(6N/n?)1/2 oo 


where 


Pr{f(m) < th =#{m <n: f(m) < t}/n. 
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Conjecture B 
lim M(x)(2x loglogx)7!/* = /6/z 
X—>0O 


=— lim M(x)(2x loglogx)~!/”. 


x 0CO 


By what has been said, Conjecture B implies not only the Riemann hypothesis, 
but also that the zeros of ¢(s) are all simple. These probabilistic conjectures provide 
a more interesting reason than symmetry for believing in the validity of the Riemann 
hypothesis, but no progress has so far been made towards proving them. 


7 Some Further Problems 


A prime p is said to be a twin prime if p + 2 is also a prime. For example, 41 is a 
twin prime since both 41 and 43 are primes. It is still not known if there are infinitely 
many twin primes. However Brun (1919), using the sieve method which he devised 
for the purpose, showed that, if infinite, the sum of the reciprocals of all twin primes 
converges. Since the sum of the reciprocals of all primes diverges, this means that few 
primes are twin primes. 

By a formal application of their circle method Hardy and Littlewood (1923) were 
led to conjecture that 


m2(x) ~ La(x) forx > ov, 


where z2(x) denotes the number of twin primes < x, 


x 
L(x) = 2C2 i dt/log*t 
2 
and 


Cp = | [G- 1/(p - 1)’) = 0.66016181.... 
p23 


This implies that 22(x)/z(x) ~ 2C2/logx. Table 2, adapted from Brent (1975), 
shows that Hardy and Littlewood’s formula agrees well with the facts. Brent also 
calculates 


>) C/pt1/(p +2) = 1.78748... 


twinp< 10!0 


and, using the Hardy—Littlewood formula for the tail, obtains the estimate 


>) C/p+1/(p +2)) = 1.90216... 


all twin p 


His calculations have been considerably extended by Nicely (1995). 
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Table 2. 
x 12 (x) L(x) m2 (x)/La(x) 
10° 35 46 0.76 
104 205 214 0.96 
10° 1224 1249 0.980 
10° 8169 8248 0.9904 
107 58980 58754 1.0038 
108 440312 440368 0.99987 
10° 3424506 3425308 0.99977 


10!9 27412679 27411417 1.000046 


Besides the twin prime formula many other asymptotic formulae were conjectured 
by Hardy and Littlewood. Most of them are contained in a general conjecture, which 
will now be described. 

Let f(t) be a polynomial in t of positive degree with integer coefficients. If f (n) 
is prime for infinitely many positive integers n, then f has positive leading coeffi- 
cient, f is irreducible over the field Q of rational numbers and, for each prime p, 
there is a positive integer n for which f () is not divisible by p. It was conjectured by 
Bouniakowsky (1857) that conversely, if these three conditions are satisfied, then f (n) 
is prime for infinitely many positive integers n. Schinzel (1958) extended the conjec- 
ture to several polynomials and Bateman and Horn (1962) gave Schinzel’s conjecture 
the following quantitative form. 

Let fj (t) be a polynomial in t of degree d; > 1, with integer coefficients and posi- 
tive leading coefficient, which is irreducible over the field Q of rational numbers (j = 
1,...,m). Suppose also that the polynomials fi (t),..., fin(t) are distinct and that, 
for each prime p, there is a positive integer n for which the product f| (7) --+ fin(”) is 
not divisible by p. Bateman and Horn’s conjecture states that, if N(x) is the number 
of positive integers n < x for which f\(”),..., fim(n) are all primes, then 


ae 
N(x) ~ (ds +++ dn" fn) f dt/log” t, 
2 
where 


C(fis-+ +s fm) = | [{G - 1/p)-"(1 — @(p)/p)}, 


Pp 


the product being taken over all primes p and «(p) denoting the number of u € F, (the 
field of p elements) such that f|(u)--- fm(u) = 0. (The convergence of the infinite 
product when the primes are taken in their natural order follows from the prime ideal 
theorem.) 

The twin prime formula is obtained by taking m = 2 and f\(t) =f, fo(t) =t+2. 
By taking instead f)(t) = t, fo(t) = 2t + 1, the Bateman—Horn conjecture gives the 
same asymptotic formula g(x) ~ L2(x) for the number zG(x) of primes p < x for 
which 2p + | is also a prime (‘Sophie Germain’ primes). By taking m = 1 and fi (t) = 
t? + 1 one obtains an asymptotic formula for the number of primes of the form n? + 1. 
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Bateman and Horn gave a heuristic derivation of their formula. However, the only 
case in which the formula has actually been proved is m = 1, n; = 1. This is the case 
of primes in an arithmetic progression which will be considered in the next chapter. 
When one considers the vast output of mathematical papers today compared with 
previous eras, it is salutary to recall that we still do not know as much about twin 
primes as Euclid knew about primes. 


8 Further Remarks 


The historical development of the prime number theorem is traced in Landau [33]. The 
original papers of Chebyshev are available in [56]. Pintz [48] has given a simple proof 
of Chebyshev’s result that z (x) = x/(A logx — B + o(1)) implies A = B = 1. 

There is an English translation of Riemann’s memoir in Edwards [20]. Complex 
variable proofs of the prime number theorem, with error term, are contained in the 
books of Ayoub [4], Ellison and Ellison [21], and Patterson [47]. For a simple complex 
variable proof without error term, due to Newman (1980), see Zagier [63]. 

A proof with error term by the Wiener-Ikehara method is given in CiZek [12]. 
Wiener’s general Tauberian theorem is proved in Rudin [52]. For its algebraic 
interpretation, see the resumé of Fourier analysis in [13]. The development of Selberg’s 
method is surveyed in Diamond [18]. An elementary proof of the prime number 
theorem which is quite different from that of Selberg and Erdés has been given by 
Daboussi [15]. 

A clear account of Stieltjes integrals is given in Widder [62]. However, we do not 
use Stieltjes integrals in any essential way, but only for the formal convenience of 
treating integration by parts and summation by parts in the same manner. Widder’s 
book also contains the Wiener—Ikehara proof of the prime number theorem. 

By a theorem of S. Bernstein (1928), proved in Widder’s book and also in 
Mattner [38], the hypotheses of Proposition 7 can be stated without reference to 
the function #(x). Bernstein’s theorem says that a real-valued function F(a) can be 
represented in the form 


F@) = | edb (x), 


where #(x) is a nondecreasing function for x > 0 and the integral is convergent for 
every o > 1, if and only if F(c) has derivatives of all orders and 


(-1)$F(c) >0 foreveryo >1 (k =0,1,2,...). 


For the Poisson summation formula see, for example, Lasser [34] and Duran 
et al. [19]. There is a useful n-dimensional generalization, discussed more fully in 
87 of Chapter XII, in which a sum over all points of a lattice is related to a sum over 
all points of the dual lattice. Further generalizations are mentioned in Chapter X. 

More extended treatments of the gamma function are given in Andrews et al. [3] 
and Remmert [49]. 

More information about the Riemann zeta-function is given in the books of 
Patterson [47], Titchmarsh [57], and Karatsuba and Voronin [30]. For numerical data, 
see Rosser and Schoenfeld [50], van de Lune et al. [37] and Rumely [53]. 
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For a proof that z(x) — Li(x) changes sign infinitely often, see Diamond [17]. 
Estimates for values of x such that a(x) > Li(x) are obtained by a technique due to 
Lehman [35]; for the most recent estimate, see Bays and Hudson [8]. 

For the pair correlation conjecture, see Montgomery [40], Goldston [24] and 
Odlyzko [45]. Random matrices are thoroughly discussed by Mehta [39]; for a nice 
introduction, see Tracy and Widom [58]. 

For Dedekind zeta functions see Stark [54], besides the books on algebraic number 
theory referred to in Chapter II. The prime ideal theorem is proved in Narkiewicz [44], 
for example. For consequences of the extended Riemann hypothesis, see Bach [5], 
Goldstein [23] and M.R. Murty [41]. Many other generalizations of the zeta function 
are discussed in the article on zeta functions in [22]. 

Function fields are treated in the books of Chevalley [11] and Deuring [16]. The 
lengthy review of Chevalley’s book by Weil in Bull. Amer. Math. Soc. 57 (1951), 
384-398, is useful but over-critical. Even if geometric methods are better adapted for 
algebraic varieties of higher dimension, the algebraic methods available for curves 
are essentially simpler. Moreover it was the close analogy with number fields that 
suggested the possibility of a Riemann hypothesis for function fields. For a proof of 
the latter, see Bombieri [9]. For the Weil conjectures, see Weil [61] and Katz [32]. 

Stichtenoth [55] gives a good account of the theory of function fields with spe- 
cial emphasis on its applications to coding theory. For these applications, see also 
Goppa [26], Tsfasman et al. [60], and Tsfasman and Vladut [59]. Curves with a given 
genus which have the maximal number of F,-points are discussed by Cossidente 
et al. [14]. 

For introductions to Ramanujan’s tau-function, see V.K. Murty [42] and Rankin’s 
article (pp. 245-268) in Andrews et al. [2]. For analogues of the prime number the- 
orem in the theory of dynamical systems, see Katok and Hasselblatt [31] and Parry 
and Pollicott [46]. Hadamard’s pioneering study of geodesics on a surface of negative 
curvature and his proof of the prime number theorem are both reproduced in [27]. 

The ‘equivalence’ of Proposition 12 with the prime number theorem is proved in 
Ayoub [4]. A proof that the Riemann hypothesis is equivalent to M(x) = O(x“) for 
every a > 1/2 is contained in the book of Titchmarsh [57]. Good and Churchhouse’s 
probabilistic conjectures appeared in [25]. For the central limit theorem and the law of 
the iterated logarithm see, for example, Adams [1], Kac [29], Bauer [7] and Loéve [36]. 

Brun’s theorem on twin primes is proved in Narkiewicz [43]. For numerical results, 
see Brent [10]. For conjectural asymptotic formulas, see Hardy and Littlewood [28] 
and Bateman and Horn [6]. There are several heuristic derivations of the twin prime 
formula, the most recent being Rubenstein [51]. It would be useful to try to analyse 
these heuristic derivations, so that the conclusion is seen as a consequence of precisely 
stated assumptions. 
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xX 
A Character Study 


1 Primes in Arithmetic Progressions 


Let a and m be integers with 1 < a < m. If a andm have a common divisor d > 1, 
then no term after the first of the arithmetic progression 


a,a+tm,a+2m,... (*) 


is a prime. Legendre (1788) conjectured, and later (1808) attempted a proof, that if 
a and mare relatively prime, then the arithmetic progression (*) contains infinitely 
many primes. 

If aj,..., an are the positive integers less than m and relatively prime to m, and if 
a j(x) denotes the number of primes < x in the arithmetic progression 


aj,aj +m,aj+2m,..., 
then Legendre’s conjecture can be stated in the form 
Tj(x) > CO asx—>oo (j=1,...,h). 
Legendre (1830) subsequently conjectured, and again gave a faulty proof, that 
nm j(x)/ae(x) > 1 asx —> co forall j,k. 
Since the total number z (x) of primes < x satisfies 
(x) = m1(x) +--+ an(x) +0, 


where c is the number of different primes dividing m, Legendre’s second conjecture is 
equivalent to 


mj(x)/t(x) > 1/h asx oo (fj=1,...,h). 


Here h = g(m) is the number of positive integers less than m and relatively prime to m. 
If one assumes the truth of the prime number theorem, then the second conjecture is 
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also equivalent to 


mj(x)~x/p(m)logx (j=1,...,g(m)). 


The validity of the second conjecture in this form is known as the prime number theo- 
rem for arithmetic progressions. 

Legendre’s first conjecture was proved by Dirichlet (1837) in an outstanding pa- 
per which combined number theory, algebra and analysis. His algebraic innovation 
was the use of characters to isolate the primes belonging to a particular residue class 
mod m. Legendre’s second conjecture, which implies the first, was proved by de la 
Vallée Poussin (1896), again using characters, at the same time that he proved the 
ordinary prime number theorem. 

Selberg (1949), (1950) has given proofs of both conjectures which avoid the use of 
complex analysis, but they are not very illuminating. The prime number theorem for 
arithmetic progressions will be proved here by an extension of the method used in the 
previous chapter to prove the ordinary prime number theorem. 

For any integer a, with 1 < a < m and (a,m) = I, let 


a(x;m,a)= », 1 
p<x,p=amodm 
Also, generalizing the definition of Chebyshev’s functions in the previous chapter, put 
O(x;m,a) = bs logp, w(x;m,a)= =. A(n). 
p<x,p=amodm n<x,n=amodm 


Exactly as in the last chapter, we can show that the prime number theorem for arith- 
metic progressions, 


m(x;m,a)~x/g(m)logx asx > o, 
is equivalent to 
yw(x;m,a)~x/g(m) asx > oo. 


It is in this form that the theorem will be proved. 


2 Characters of Finite Abelian Groups 


Let G be an abelian group with identity element e. A character of G is a function 
x :G— C such that 


(i) x(ab) = y(a)x (db) for all a,b € G, 
(i) y(c) 4 0 for some c € G. 


Since y(c) = x(ca7')y(a), by (i), it follows from (ii) that y(a) 4 0 for every 
a € G. (Thus y is a homomorphism of G into the multiplicative group C* of nonzero 
complex numbers.) Moreover, since x(a) = xy (a)y(e), we must have x (e) = 1. Since 
x(a)x (a7!) = x(e), it follows that y(a~!) = y(a)71. 
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The function y; : G > C defined by y1(a) = 1 for every a € G is obviously a 
character of G, the trivial character (also called the principal character!). Moreover, 
for any character y of G, the function y~! : G > C defined by y~!(a) = x (a)7! is 
also a character of G. Furthermore, if y’ and y” are characters of G, then the function 
xx": G > C defined by 7/y"(a) = y'(a)y"(@) is a character of G. Since 


ices ae ey ee eee ee ey ae 


it follows that the set G of all characters of G is itself an abelian group, the dual group 
of G, with the trivial character as identity element. 

Suppose now that the group G is finite, of order g say. Then y (a) is a g-th root of 
unity for every a € G, since a§ = e and hence 


yay =e) =7@) $1, 


It follows that |y(a)| = 1 and y~!(a) = (a). Thus we will sometimes write 7 
instead of y~!. 


Proposition 1 The dual group G of a finite abelian group G is a finite abelian group 
of the same order. Moreover, ifa € G anda # e, then x(a) # | for some x € G. 


Proof Let g denote the order of G. Suppose first that G is a cyclic group, generated 
by the element c. Then any character y of G is uniquely determined by the value y (c), 
which is a g-th root of unity. Conversely if a; = eti/8(0 < j < g) isa g-th root 
of unity, then the functions yV) : G > C defined by y“)(c*) = cof are distinct 
characters of G and y“) (ck) £ 1 for 1 < k < g. It follows that the proposition is true 
when G is cyclic. The general case can be reduced to this by using the fact (see §4 of 
Chapter IT) that any finite abelian group is a direct product of cyclic groups. However, 
it can also be treated directly in the following way. 

We use induction on g and suppose that G is not cyclic. Let H be a maximal proper 
subgroup of G and let h be the order of H. Leta € G\H and let r be the least positive 
integer such that b = a’ € H. Since G is generated by H anda, and a” € H if and 
only if r divides n, each x € G can be uniquely expressed in the form 


ao. 


where y € H andO <k <r.Hence g =rh. 
If y is any character of G, its restriction to H is a character y of H. Moreover y 
is uniquely determined by y and the value y (a), since 


x(a’y) = x(a‘ wo). 


Since y(a)’ = y(b) is aroot of unity, @ = x(a) is aroot of unity such that” = y(b). 

Conversely, it is easily verified that, for each character y of H and for each of 
the r roots of unity @ such that w” = y/(b), the function y : G — C defined by 
x(a‘ y) = ow w(y) is a character of G. Since H has exactly h characters by the induc- 
tion hypothesis, it follows that G has exactly rh = g characters. It remains to show 
that if a*y A e, then y(a*y) ¥ 1 for some x. But if wo y(y) = 1 forall w, then k = 0; 
hence y 4 e and x(y) = w(y) # 1 for some y, by the induction hypothesis. 
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Proposition 2 Let G be a finite abelian group of order g and G its dual group. Then 
(i) 


_|s fx=n, 
Ya=I' fz #1. 


aeG 
(ii) 
_fe ifa=e, 
210 ~ ( ifa #e. 
Proof Put 


s= YL x@. 


Since it is obvious that S = g if y = 71, we assume y ¥ 1. Then y(b) ¥ | for some 
b € G. Since ab runs through all elements of G at the same time as a, 


1)S = >) x@x) = > x@b)=S. 


aeG aeG 
Since y(b) # 1, it follows that S = 0. 


Now put 
c= 2 x(a). 
xeG 


Evidently T = g if a = e since, by Proposition 1, G also has order g. Thus we now 
assume a # e. By Proposition | also, for some y € G we have y(a) # 1. Since yy 
runs through all elements of G at the same time as y, 


y(a)T = >) x@ya)= >) xy@ =T. 
yeG yeG 
Since y(a) # 1, it follows that T = 0. 


Since the product of two characters is again a character, and since y is the inverse 
of the character y, Proposition 2(i) can be stated in the apparently more general form 


(iy! : 
Y x1@v@) = 


aeG 


g fx=y, 
O ifxFy. 


Similarly, since ¥(b) = x (b~!), Proposition 2(ii) can be stated in the form 


(ii . g ifa=b, 
DY xr@ib) =4" * 
zs 0 fab. 
xEG 
The relations (i)’ and (ii)’ are known as the orthogonality relations, for the characters 
and elements respectively, of a finite abelian group. 
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The finite abelian group in which we are interested is the multiplicative group Z (m) of 
integers relatively prime to m, where m > 1| will be fixed from now on. The group 
Cy = Lim) has order g(m), where g(m) denotes as usual the number of positive 
integers less than m and relatively prime to m. 

A Dirichlet character mod m is defined to be a function y : Z > C with the 
properties 


(i) x(ab) = x (a)x (6) for all a, b € Z, 
(ii) y(a) = x (b) ifa = bmodm, 
(iii) x(a) # Oif and only if (a, m) = 1. 

Any character vy of G,, can be extended to a Dirichlet character mod m by putting 
x(a) = Oifa € Zand (a,m) F 1. Conversely, on account of (ii), any Dirichlet 
character mod m uniquely determines a character of Gy. 

To illustrate the definition, here are some examples of Dirichlet characters. In each 
case we set y(a) = Oif (a,m) 4 1. 

(I) m = p is an odd prime and y(a) = (a/p) if p{a, where (a/p) is the Legendre 
symbol; 
(I) m = 4 and y(a) = 1 or —1 according as a = | or —1 mod 4; 
(Il) m = 8 and y (a) = 1 or —1 according as a = +1 or +3 mod 8. 


We now return to the general case. By the results of the previous section we have 


‘ _ jem) ify=n, 
x= % ify Ax, 


n=1 


and 


0 otherwise, 


(m) ifa=1modm, 
i 1@ = ( 
x 


where y runs through all Dirichlet characters mod m. Furthermore 


— _ . _|gin) ify=y, 
Yxmen= |" aren 


n=1 


and 


_ g(m) if (a,m) = 1 anda =bmodm, 
> X(a)x(b) = : 
7 0) otherwise. 


Lemma 3 /f y 4 yw is a Dirichlet character modm then, for any positive 
integer N, 


N 
“i x@) 


n=1 


< y(m)/2. 
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Proof Any positive integer N can be written in the form N = qm +r, where g > 0 
and 1 <r <™m. Since x(a) = x(b) if a = bmodm, we have 


N m 2m qm qm+r 
Vx = (d+ SDtet > Ja + >> x) 
n=1 n=l n=m+1 n=(q—1)m+1 n=qm+l 

m E 
=¢> x2) + >i x. 
n=1 n=1 
But >-7, x(n) = 0, since y # x1. Hence 
N r m 
> 7@) = >ixM=- >) xO). 
n=1 n=1 n=r+1 
Since |y(1)| = 1 or 0 according as (n,m) = 1 or (n,m) ¥ 1, and since g(m) is the 


number of positive integers n < m such that (n, m) = 1, the result follows. 


With each Dirichlet character y, there is associated a Dirichlet L-function 


LG, x) = > x@)/n. 


n=1 


Since |v ()| < 1 for all n, the series is absolutely convergent foro := Zs > 1. We 
are going to show that if y # 71, then the series is also convergent for 0 > 0. (It does 
not converge if 0 < 0, since then |y(n)/n*| > 1 for infinitely many n.) 

Put 


H(x)= >) x). 
Then 


x+ 
YS xan = [ t *dH(t) 


n<x 
x 
=Hixjae” +s/ H(t)t~*~!dt. 
1 
Since H (x) is bounded, by Lemma 3, on letting x — oo we obtain 
[0.0] 
L(s, x) = sf H(t)t*—'dt foro > 0. 
1 


Moreover the integral on the right is uniformly convergent in any half-plane o > 0, 
where 6 > 0, and hence L(s, v) is a holomorphic function for 0 > 0. 

The following discussion of Dirichlet L-functions and the prime number theorem 
for arithmetic progressions runs parallel to that of the Riemann ¢-function and the 
ordinary prime number theorem in the previous chapter. Consequently we will be more 
brief. 
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Proposition 4 L(s, vy) = )(. — x(p)p~)7! foro > 1, where the product is taken 
over all primes p. 


Proof The property y (ab) = x(a)z(b) for all a, b € N enables the proof of Euler’s 
product formula for ¢(s) to be carried over to the present case. For 0 > 0 we have 


—2s 


(L= x(p)p yt = 1+ x0P)PE + xP?) P™ + (PP + 


and hence fora > 1 


[[G-z@ery! = & xn, 


psx n<Nx 


where JN, is the set of all positive integers whose prime factors are all < x. Letting 
XxX — Oo, we obtain the result. 


It follows at once that 
L(s, m1) =e) |G - p*) 
pim 
and that, for any Dirichlet character y, L(s, y) AO foro > 1. 
Proposition 5 —L’(s, y)/L(s, x) = So, x(n) A(n)/n fora > 1. 


Proof The series w(s, ¥) = --~., x(n) A(n)n~S converges absolutely and uniformly 
in any half-plane o > 1+ ¢, where ¢ > 0. Moreover, as in the proof of Proposi- 
tion IX.6, 


L(6s, NOG, x) = DI xDIT DI xWOAWOK = Din DY xDx OAH 


j=l k=1 n=1 phan 


= Sin xa) b> A(d) = Sin x) logn =-L'(s, x). 
nel 


d|n n=1 


As in the proof of Proposition [X.6, we can also prove directly that L(s, ¥) 4 0 
for o > 1, and thus make the proof of the prime number theorem for arithmetic 
progressions independent of Proposition 4. 

The following general result, due to Landau (1905), considerably simplifies the 
subsequent argument (and has other applications). 


Proposition 6 Let d(x) be a nondecreasing function for x > 0 such that the integral 


‘Ox | edb (x) (i) 


is convergent for s > B. Thus f is holomorphic in this half-plane. If the definition 
of f can be extended so that it is holomorphic on the real segment (a, f], then the 
integral in ({) is convergent also for Bs > a. Thus f is actually holomorphic, and (7) 
holds, in this larger half-plane. 
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Proof Since f is holomorphic at £, we can choose 6 > 0 so that f is holomorphic 
in the disc |s — (8 + 6)| < 26. Thus its Taylor series converges in this disc. But for 
4s > f the n-th derivative of f is given by 


co 


f(s) — (-1)" e*x"dg(x). 
0 


Hence, for any o such that 8 —6 <o < B+, 


fo) = > @ - B- 6)" fB+s)/n! 


n=0 


= >i@ -f-6)"(-1)" * eB Ma b(x)/n! 
0 


n=0 
= yf e B49 (8 4 § —G)"x"/n\ d(x). 
n=0 


Since the integrands are non-negative, we can interchange the orders of summation 
and integration, obtaining 


flo) = fo BH SG 4 5~oy"x" mage) 


n=0 


= | * @-B+9)x p(B+9-9)x 44 (x) 
0 


= [ * oF d h(x). 


Thus the integral in (7) converges for real s > f — 0. 

Let y be the greatest lower bound of all real s € (a, /) for which the integral 
in ({) converges. Then the integral in (7) is also convergent for Zs > y and defines 
there a holomorphic function. Since this holomorphic function coincides with f(s) for 
&s > B, it follows that (7) holds for #s > y. Moreover y = a, since if y > a we 
could replace £ by y in the preceding argument and thus obtain a contradiction to the 
definition of y. 


The punch-line is the following proposition: 
Proposition 7 L(1 + it, y) 4 0 for every real t and every y # 1. 


Proof Assume on the contrary that L(1 + ia, ¥) = 0 for some real a and some 
x # x1. Then also L(1 — ia, x) = 0. If we put 


f(s) = C(S)L(s +a, x)L(s — ia, Z), 


then f is holomorphic and nonzero for o > 1. Furthermore f is holomorphic on the 
real segment [1/2, 1], since the double pole of ¢ ?(s) at s = 1 is cancelled by the zeros 
of the other two factors. By logarithmic differentiation we obtain, foro > 1, 
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—f'()/f(s) 
= —2¢"(s)/¢(s) — L's + ia, x)/L(s + ia, x)—L'(s—ia, x) /L(s — ia, x) 


=2>0AMn™ + Do xa) A(nyn + SZ) AM) *4 


n=1 n=1 n=1 


fore) 
—s 
= Cnn ys 


n=2 
where 
Cn = {2+ x(n)n—" + Z(n)n'*} A(n) = 211 + B(y(n)n"*)} A(n). 


Since |y(n)| < 1 and |n~'| = 1, it follows that c, > 0 for all n > 2. If we put 


co 


g(s)= Dd ean™ /logn, 
n=2 
then g’(s) = f’(s)/f(s) foro > 1 and so the derivative of e~8 f(s) is 
{f'(s) — 8's) f je = 0. 


Thus f(s) = Ce’), where C is a constant. In fact C = 1, since g(a) > 0 and 
f(c) — 1laso — +00. Since g(s) is the sum of an absolutely convergent Dirichlet 


series with nonnegative coefficients, so also are the powers g*(s) (k = 2,3,...). 
Hence also 
[o.@) 
f(s) = 8) = 14 g(s) + g7(8)/2!4+---= Doann™® foro > 1, 
n=1 


where a, > 0 for every n. It follows from Proposition 6 that the series pear ayn? 


must actually converge with sum f(o) fora > 1/2. We will show that this leads to a 
contradiction. 
Take n = p”, where p is a prime. Then, by the manner of its formation, 
An > Cn/logn+ c, /2 log” Pp 
= {2+ x(p)p-™ + Z(p) Pp" 3/2 + (2+ x(p) p+ X(P)P*Y/2 
=2-z(P)z(P) + (1+ xP) + ECD)" = 1, 


since |y(p)| < 1. Hence 


co 


f0/2)= Di an/n? > > a, fn? > > 1/p. 


n=l n=p? P 


Since >” . 1/p diverges, this is a contradiction. 
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Proposition 8 >),<, 41 A(n) ~ xX, Dey MAC) = Ox) fx F m1. 


Proof For any Dirichlet character y, put 


g(s) = -6(s)/C(s) — L'(s, x)/2L(s, x) — L'(s, x)/2L(s, X), 
h(s) = -¢'(s)/C(s) — L's, x)/2iLG, x) + L'(s, 4)/2iL(s, x). 


Foro = &s > 1 we have 


g(s) = UF Az} A(nyn™, 


n=1 


A(s) = D1 + Fy(n)jA(njn-. 


n=1 


If y ~ m1 then, by Proposition 7, g(s) — 1/(s — 1) and h(s) — 1/(s — 1) are holomor- 
phic for #s > 1. Since the coefficients of the Dirichlet series for g(s) and h(s) are 
nonnegative, it follows from Ikehara’s theorem (Theorem IX.9) that 


D+ B@xM}A(n) ~ x, 


n<x 


Stl + Fx} AM) ~ x. 


N<x 


On the other hand, if y = y; then g(s)—2/(s—1) and h(s)—1/(s—1) are holomorphic 
for Zs > 1, from which we obtain in the same way 


SU + 2@)} 4AM) ~ 2x, 
2 A(n) ~ x. 


The result follows. 


The prime number theorem for arithmetic progressions can now be deduced 
immediately. For, by the orthogonality relations and Proposition 8, if | < a < m 
and (a, m) = 1, then 


w(x;m,a) = b> A(n) 


n<x,n=amodm 
=> x1@ > x2) AM)/e(m) 
x n<x 
~ x/p(m). 
It is also possible to obtain error bounds in the prime number theorem for arith- 


metic progressions of the same type as those in the ordinary prime number theorem. 
For example, it may be shown that for each a > 0, 
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y(x;m, a) = x/p(m) + O(x/log* x), 
m(x;m,a) = Li(x)/o(m) + O(x/ log’ x), 
where the constants implied by the O-symbols depend on a, but not on m or a. 
In the same manner as for the Riemann zeta-function ¢(s) it may be shown that the 
Dirichlet L-function L(s, 7) satisfies a functional equation, provided y is a primitive 
character. (Here a Dirichlet character y mod m is primitive if for each proper divisor d 


of m there exists an integer a = 1 modd with (a,m) = 1 and y(a) ¥ 1.) Explicitly, 
if v is a primitive character mod m and if one puts 


A(s, x) = (n/a)? T(s + 0)/2)L(s, x), 
where 6 = 0 or | according as y(—1) = 1 or —1, then 


ACU —s, %) = €, ACS, x), 


where 
m 
éy = i~®m—1/2 yy wie, 
k=1 
It follows from the functional equation that |¢,| = 1. Indeed, by taking complex 


conjugates we obtain, for real s, 
A(l—s,x%) =é,Al(s, x) 
and hence, on replacing s by 1 — s, 
A(Ss, x) = &y AC — 5, X) = €yéy A(S, X). 


The extended Riemann hypothesis implies that no Dirichlet L-function L(s, vy) has 
a zero in the half-plane Zs > 1/2, since f(s) = [] " L(s, x) is the Dedekind zeta- 
function of the algebraic number field K = Q(e?*‘/""). Hence it may be shown that if 
the extended Riemann hypothesis holds, then 


w(x; m, a) = x/o(m) + O(x'/” log” x) 


and 


m(x;m,a) = Li(x)/g(m) + O(x'/2 log x), 


where the constants implied by the O-symbols are independent of m and a. 
Assuming the extended Riemann hypothesis, Bach and Sorenson (1996) have shown 
that, for any a,m with 1 < a < mand (a,m) = 1, the least prime p = amodm 
satisfies p < 2(mlogm)’. 

Without any hypothesis, Linnik (1944) proved that there exists an absolute 
constant L such that the least prime in any arithmetic progressiona,a+m,a+2m,..., 
where | < a < mand (a,m) = 1, does not exceed m” if m is sufficiently large. 
Heath-Brown (1992) has shown that one can take any L > 11/2. 
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4 Representations of Arbitrary Finite Groups 


The problem of extending the character theory of finite abelian groups to arbitrary 
finite groups was proposed by Dedekind and solved by Frobenius (1896). Simplifi- 
cations were afterwards found by Frobenius himself, Burnside and Schur (1905). We 
will follow Schur’s treatment, which is distinguished by its simplicity. It turns out that 
for nonabelian groups the concept of ‘representation’ is more fundamental than that of 
‘character’. 

A representation of a group G is a mapping p of G into the set of all linear trans- 
formations of a finite-dimensional vector space V over the field C of complex numbers 
which preserves products, i.e. 


p(st) = p(s)p(t) foralls,t eG, (1) 


and maps the identity element of G into the identity transformation of V : p(e) = 
I. The dimension of the vector space V is called the degree of the representation 
(although ‘dimension’ would be more natural). 

It follows at once from (1) that 


p(s)p(s—') = p(s')p(s) = 1. 


Thus, for every s € G, p(s) is an invertible linear transformation of V and p(s~!) = 
p(s)~'. (Hence a representation of G is a homomorphism of G into the group GL(V) 
of all invertible linear transformations of V .) 

Any group has a trivial representation of degree | in which every element of the 
group is mapped into the scalar 1. 

Also, with any group G of finite order g a representation of degree g may be de- 
fined in the following way. Let s;,..., 5g be an enumeration of the elements of G and 
let e1,..., @g be a basis for a g-dimensional vector space V over C. We define a linear 
transformation A(s;) of V by its action on the basis elements: 


A(si)ej =ex if SiSj = Sk. 
Then, for all s,t € G, 
A(s7!)A(s) =1,  A(st) = A(s)A(t). 


Thus the mapping pr : s; — A(s;) is a representation of G, known as the regular 
representation. 

By choosing a basis for the vector space we can reformulate the preceding defini- 
tions in terms of matrices. A representation of a group G is then a product-preserving 
map s — A(s) of G into the group of all m x 1 non-singular matrices of complex 
numbers. The positive integer n is the degree of the representation. However, we must 
regard two matrix representations s > A(s) and s — B(s) as equivalent if one is 
obtained from the other simply by changing the basis of the vector space, i.e. if there 
exists a non-singular matrix T such that 


T~'A(s)T = Bs) for everys €G. 
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It is easily verified that if s — A(s) is a matrix representation of degree n of a 
group G, then s > A(s~!)! (the transpose of A(s!)) is a representation of the same 
degree, the contragredient representation. Furthermore, s > det A(s) is a representa- 
tion of degree 1. 

Again, if p : s — A(s) ando : s > B(s) are matrix representations of a group 
G, of degrees m and n respectively, then the Kronecker product mapping 


s— A(s) ® B(s) 
is also a representation of G, of degree mn, since 
(A(s) ® B(s)) (A(t) ® BM) = A(st) @ Bist). 


We will call this representation simply the product of the representations p and o, and 
denote it by p @oa. 

The basic problem of representation theory is to determine all possible representa- 
tions of a given group. As we will see, all representations may in fact be built up from 
certain ‘irreducible’ ones. 

Let p be a representation of a group G by linear transformations of a vector space 
V. If asubspace U of V is invariant under G, i.e. if 


p(s)U CU forevery s € G, 


then the restrictions to U of the given linear transformations provide a representation 
pu Of G by linear transformations of the vector space U. If it happens that there exists 
another subspace W invariant under G such that V is the direct sum of U and W, ie. 
V=U+W and UN W = {0}, then the representation p is completely determined by 
the representations py and pw and will be said simply to be their sum. 

A representation p of a group G by linear transformations of a vector space V is 
said to be irreducible if no nontrivial proper subspace of V is invariant under G, and 
reducible otherwise. Evidently any representation of degree | is irreducible. 

A matrix representation s > A(s), of degree n, of a group G is reducible if it is 
equivalent to a representation in which all matrices have the block form 


P(s) Q(s) 
0 R(s)})? 
where P(s) is a square matrix of orderm,0 <m <n. Thens > P(s) ands > R(s) 


are representations of G of degrees m and n — m respectively. The given representation 
is the sum of these representations if there exists a non-singular matrix T such that 


T~!A(s)T = (ey a) for every s € G. 


The following theorem of Maschke (1899) reduces the problem of finding all 
representations of a finite group to that of finding all irreducible representations. 


Proposition 9 Every representation of a finite group is (equivalent to) a sum of 
irreducible representations. 


412 X A Character Study 
Proof We give a constructive proof due to Schur. Let s > A(s), where 
_ (Ps) Qs) 
Ao)= ( 0 R(s))’ 


be a reducible representation of a group G of finite order g. Since the mapping 
s — A(s) preserves products, we have 


P(st) = P(s)P(t), Rist) =R(s)ROM, Q(t) = P(S)QO) + Q(s)RW). (2) 


The non-singular matrix 


satisfies 
Pt) O)),_7,(P® 0 
‘e Ny ea is) @) 
if and only if 
MR(t) = P(t)M+ Q(t). 
Take 
M = 87! ¥) O(s)R(s“"). 
seG 
Then, by (2), 
P()M = g7! S{O(ts) — O()R(s)}R(ST') 
seG 
= 87! >) Olts) Rt!) RW — OW = MR(t) - C(O), 
seG 
and hence (3) holds. 


Thus the given reducible representation s — A(s) is the sum of two representa- 
tions s > P(s) ands — R(s) of lower degree. The result follows by induction on the 
degree. 


Maschke’s original proof of Proposition 9 depended on showing that every repre- 
sentation of a finite group is equivalent to a representation by unitary matrices. We 
briefly sketch the argument. Let p : s > A(s) be a representation of a finite group G 
by linear transformations of a finite-dimensional vector space V. We may suppose V 
equipped with a positive definite inner product (u, v). It is easily verified that 


(u,v)c = g7' D“(AMu, A@o) 


teG 
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is also a positive definite inner product on V and that it is invariant under G, i.e. 
(A(s)u, A(s)v)g = (u,v)g_ foreverys € G. 


If U is a subspace of V which is invariant under G, and if U+ is the subspace 
consisting of all vectors v € V such that (u,v)g = 0 for every u € U, then U+ 
is also invariant under G and V is the direct sum of U and U+. Thus p is the sum of 
its restrictions to U and U+. 

The basic result for irreducible representations is Schur’s lemma, which comes in 
two parts: 


Proposition 10 (i) Lets — Aj(s) ands — Anx(s) be irreducible representations of 
a group G by linear transformations of the vector spaces V, and V3. If there exists a 
linear transformation T 4 0 of V, into V2 such that 


T A\(s) = A2(s)T for every s € G, 


then the spaces V, and V2 have the same dimension and T is invertible, so that the 
representations are equivalent. 


(11) Let s + A(s) be an irreducible representation of a group G by linear transforma- 
tions of a vector space V. A linear transformation T of V has the property 


TA(s) = A(s)T for everys € G (4) 


if and only if T = AI for some 2 € C. 


Proof (i) The image of V; under T is a subspace of V2 which is invariant under the 
second representation. Since T 4 0 and the representation is irreducible, it must be 
the whole space: TV; = V2. On the other hand, those vectors in Vj whose image 
under T is 0 form a subspace of V; which is invariant under the first representation. 
Since T 4 0 and the representation is irreducible, it must contain only the zero vector. 
Hence distinct vectors of V; have distinct images in V2 under 7. Thus T is a one-to-one 
mapping of V; onto V2. 

(i1) By the fundamental theorem of algebra, there exists a complex number 4 such 
that det(A7 — T) = 0. Hence T — JJ is not invertible. But if T has the property (4), 
so does T — AI. Therefore T — AJ = 0, by (i) with Ay = Ap. It is obvious that, 
conversely, (4) holds if T = AJ. 


Corollary 11 Every irreducible representation of an abelian group is of degree 1. 


Proof By Proposition 10 (11) all elements of the group must be represented by scalar 
multiples of the identity transformation. But such a representation is irreducible only 
if its degree is 1. 


414 X A Character Study 


5 Characters of Arbitrary Finite Groups 


By definition, the trace of ann x n matrix A = (a;;) is the sum of its main diagonal 
elements: 


n 
trA = > Gii- 
i=1 
It is easily verified that, for any n x n matrices A, B and any scalars 2, w, we have 


trOUA+ wB)=AtrA+ utr B, 
tr(AB) = tr(BA), tr(A ® B) = (tr A)(tr B). 


Let p : s > A(s) be a matrix representation of a group G. By the character of the 
representation p we mean the mapping v : G > C defined by 


x(s) = trA(s). 


Since tr(T~! AT) = tr(ATT~!) = trA, equivalent representations have the same char- 
acter. The significance of characters stems from the converse, which will be proved 
below. 

Clearly the character y of a representation p is a class function, i.e. 


x(st) = y(ts) foralls,t eG. 


The degree n of the representation p is determined by its character y, since A(e) = I, 
and hence y(e) =n. 

If the representation p is the sum of two representations p’ and p”, the correspond- 
ing characters v, y’, y” evidently satisfy 


xls) = x'(s) + y"(s) foreverys € G. 


On the other hand, if the representation p is the product of the representations p’ and 
" 
p”, then 


xls) = x'(s)x""(s)_ for every s € G. 


Thus the set of all characters of a group is closed under addition and multiplication. 
The character of an irreducible representation will be called simply an irreducible 
character. 

Let G be a group and p a representation of G of degree n with character y. If 
s is an element of G of finite order m, then by restriction p defines a representation 
of the cyclic group generated by s. By Proposition 9 and Corollary 11, this represen- 
tation is equivalent to a sum of representations of degree 1. Thus if $ is the matrix 
representing s, there exists an invertible matrix T such that 


T~'ST =diagla,..., ap] 
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is a diagonal matrix. Moreover, since 
—1 ok : k k 
TT S*T =diagloj,...,@,], 
@1,..+.+, @y are all m-th roots of unity. Thus 


X(s) = @1 +++ +O 


is a sum of n m-th roots of unity. Since the inverse of a root of unity @ is its complex 
conjugate «, it follows that 


1 


xs!) =o,'+---+0,' = 76). 


Now let G be a group of finite order g, and let p : s  A(s) ando :s > B(s) be 
irreducible matrix representations of G of degrees n and m respectively. For any n x m 
matrix C, form the matrix 


P=) Acre). 


seG 
Since ts runs through the elements of G at the same time as s, 
A(t)T =TB(t) foreveryt € G. 


Therefore, by Schur’s lemma, T = O if p is not equivalent too and T = Al ifp=o. 
In particular, take C to be any one of the mn matrices which have a single entry 1 and 
all other entries 0. Then if A = (aj), B = (Pxi), we get 


_ 0 if p,o are inequivalent, 
Sais (s)Bua(s v={i Pr rae: 
ce joi i p=o, 
where 6;; = 1 or O according asi = / ori # / (‘Kronecker delta’). Since for 


(aij) = (Pij) the left side is unchanged when i is interchanged with k and j with 
1, we must have 4 ;, = A6;x. To determine / set i = /, j = k and sum with respect 
to k. Since the matrices representing s and s~! are inverse, we get gl = nd. Thus 


> aij (sax (s~') = 


{5 if j =kandi=l, 
seG 


0 otherwise. 


If “, v run through an index set for the inequivalent irreducible representations of 
G, then the relations which have been obtained can be rewritten in the form 


ee n ft gS Vj = kot =, 
Da) airs »=(¢ sae eal (5) 


0 otherwise. 
sEeG 


The orthogonality relations (5) for the irreducible matrix elements have several corol- 
laries: 


(i) The functions a? : G > Care linearly independent. 
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For suppose there exist es ) © C such that 


ae dj. (4) a (s) = =0 foreverys €G. 
LJeit 


Multiplying by a (s~}) and summing over all s € G, we get (g/ny yaw) — = 0. Hence 
every coefficient aw) vanishes. 
(ii) 
DY rus) = 20nv- (6) 
seG 
This follows from (5) by setting i = j,k =/ and summing over j, /. 
(iii) The irreducible characters y, are linearly independent. 


In fact (iii) follows from (6) in the same way that (i) follows from (5). 

The orthogonality relations (6) for the irreducible characters enable us to decom- 
pose a given representation p into irreducible representations. For if p = ®my py is a 
direct sum decomposition of p into irreducible components p,,, where the coefficients 
mM, are non-negative integers, and if p has character y, then 


Xs) = Do matu (s). 
Lt 


Multiplying by y,(s!) and summing over all s € G, we deduce from (6) that 


g° > tee Jam (7) 


sEG 


Thus the multiplicities m,, are uniquely determined by the character y of the represen- 
tation p. It follows that two representations are equivalent if and only if they have the 
same character. 

In the same way we find 


8 > xOx6") = Di m,?. (8) 
seG “ 
Hence a representation p with character y is irreducible if and only if 
8) YLxoxeYah 
seG 


The procedure for decomposing a representation into its irreducible components 
may be applied, in particular, to the regular representation. Evidently the g x g matrix 
representing an element s has all its main diagonal elements 0 if s ¥ e and all its main 
diagonal elements | if s = e. Thus the character yr of the regular representation pr 
is given by 


xrR(e)=8, xXR(s)=0 ifs Ze. 
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Since 7,(e) = ny is the degree of the v-th irreducible representation, it follows from (7) 
that m, = ny. Thus every irreducible representation is contained in the direct sum 
decomposition of the regular representation, and moreover each occurs as often as its 
degree. 

It follows that 


> Se uO =0 ifs #e. (9) 


Lt Lt 


Thus the total number of functions ae is >* rag ra = g. Therefore, since they are 
linearly independent, every function ¢ : G > C is a linear combination of functions 
at ‘) occurring in irreducible matrix representations. 
We show next that every class function ¢ : G > C is a linear combination of 
irreducible characters x,. By what we have just proved ¢ = >° ” du,» Where 


Ny 


= CO) (x) 
pu ~ > Ajj @ ij 


i,j=l 


and 4 € C. But (st) = (ts) and 


pu (st) = = ye Mae Oy du (ts) i S a Oa &) 
i,j,k i,j,k 


(“) 


Since the functions a; j are linearly independent, we must have 


Doe a = Dodgy aa (0. 
k 


If we denote by 7 the transpose of the matrix (a), we can rewrite this in the form 
AM ()T =< TH) AM (1). 


Consequently, by Schur’s lemma, TH) = Q wn, and hence ¢, = AyxX,. Thus 
@ = ae Aukw- 

Two elements u, v of a group G are said to be conjugate if v = s~'us for some 
s € G. It is easily verified that conjugacy is an equivalence relation. Consequently G 
is the union of pairwise disjoint subsets, called conjugacy classes, such that two ele- 
ments belong to the same subset if and only if they are conjugate. The inverses of all 
elements in a conjugacy class again form a conjugacy class, the inverse class. 

In this terminology a function ¢ : G > C is a class function if and only if 
f(u) = $(v) whenever u and v belong to the same conjugacy class. Thus the number 
of linearly independent class functions is just the number of conjugacy classes in G. 
Since the characters y, form a basis for the class functions, it follows that the number 
of inequivalent irreducible representations is equal to the number of conjugacy classes 
in the group. 

If a group of order g has r conjugacy classes then, by (9), g = ny tere + n2, 
Since it is abelian if and only if every conjugacy class contains exactly one element, 
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ie. if and only if r = g, it follows that a finite group is abelian if and only if every 
irreducible representation has degree 1. 

Let @,..., G, be the conjugacy classes of the group G and let h; be the number 
of elements in @ (k = 1,..., 7). Changing notation, we will now denote by yix the 
common value of the character of all elements in the k-th conjugacy class in the i-th 
irreducible representation. Then, since y(s~!) = y(s), the orthogonality relations (6) 
can be rewritten in the form 


- 1 ifi=k 
—] ——— ? 
&§ 2 jXij Xkj ( afi chk. (10) 


Thus the r x r matrices A = (viz), B = (g7!hiZ—) satisfy AB = I. Therefore also 
BA=T, ie. 


r * 
aphe: RISE 
a 11 
2 Taint ( ifi #k. ay) 


It may be noted that h; divides g since, for any sx € Gx, g/hx is the order of the 
subgroup formed by all elements of G which commute with sz. We are going to show 
finally that the degree of any irreducible representation divides the order of the group. 

Any representation p : s > A(s) of a finite group G may be extended by linearity 
to the set of all linear combinations of elements of G: 


(> a8) = > as A(s). 
seG seG 


In particular, let C, denote the sum of all elements in the k-th conjugacy class @ of G. 
For any t,u € G, 


u~!sput — t(t~!u—!s,ut) 
and hence 
p(CY)AW) = >) Ast) = D2 A(ts) = AMP(Cr). 
SEC, SEC 
If p = p; is an irreducible representation, it follows from Schur’s lemma that 


pi(Ce) = Aix In; . Moreover, since 
trpi (Cr) = he xix, 


where hx again denotes the number of elements in @;, we must have Ajg = hg xix /Ni- 
Now let 


C= Dig / he CKCe, 


k=1 


where Gj, is the conjugacy class inverse to @ . (Otherwise expressed, 
C= Dis teG sts—'t—!). Then p;(C) = yjIn,, where 


Vi = Di(e/ he )dindix = (8/7) > haxintie = (e/ni)’, 


k=1 k=1 
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by (10). If pr(C) is the matrix representing C in the regular representation, it fol- 
lows that there exists an invertible matrix T such that T~!pr(C)T is a diagonal ma- 
trix, consisting of the matrices (g/ ni Tie, repeated n; times, for every 7. In partic- 
ular, (g/nj)* is a root of the characteristic polynomial ¢(2) = det(Al, — pr(C)) 
for every i. But pr(C) is a matrix with integer entries and hence the polynomial 
@(A) = AS ads! eee + dg has integer coefficients a}, ..., ag. The following 
lemma, already proved in Proposition II.16 but reproved for convenience of reference 
here, now implies that (g/n;)7 is an integer and hence that n; divides g. 


Lemma 12 /f ¢(A) = 4" +.ajA"~! +--+ + ay is a monic polynomial with integer co- 
efficients a,,...,@n andr a rational number such that $(r) = 0, then r is an integer. 


Proof We can write r = b/c, where b and c are relatively prime integers and c > 0. 
Then 


b" +a,;b""!c+.--+an,c" =0 


and hence c divides b”. Since c and b have no common prime factor, this implies c = 1. 


If we apply the preceding argument to C;, rather than to C, we see that there 
exists an invertible matrix 7; such that it pPR(Cx)Tx is a diagonal matrix, consisting 
of the matrices (hy xix /ni)In, repeated n; times, for every i. Thus hy yix/nj is a root 
of the characteristic polynomial ¢j(A) = det(A/, — pr(Cx)). Since this is a monic 
polynomial with integer coefficients, it follows that hy yix/n; is an algebraic integer. 
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Let H be a subgroup of finite index n of a group G, i.e. G is the disjoint union of 
n left cosets of H: 


G=s,HU.---Us,H. 


Also, let there be given a representation 0: t A(t) of H by linear transformations 
of a vector space V. The representation ao: s > A(s) of G induced by the given 
representation o of H is defined in the following way: 

Take the vector space V to be the direct sum of n subspaces V;, where V; consists 
of all formal products s; - v (v € V) with the rules of combination 


si-(o+0')=5;-0+5;-0', sj - (Av) = A(s; - D). 
Then we set 
A(s)sj 0 = sj A(t)o, 


where ¢ and s; are determined from s and s; by requiring that t = a7 8 e H. 
The degree of the induced representation of G is thus n times the degree of the original 
representation of H. 
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With respect to a given basis of V let A(t) now denote the matrix representing 
t € H and put A(s) = O if s € G\H. If one adopts corresponding bases for each of 
the subspaces V;, then the matrix A(s) representing s € G in the induced representa- 
tion is the block matrix 


A(s; ‘ss1) A(s; ‘ss2) nee A(s, ‘s5n) 
A= A(s5 's81) A(s5 '582) ee A(s5 'SSn) 
A(s7!ss1) A(s7!ss2) vee A(s7 ssn) 


Evidently each row and each column contains exactly one nonzero block. It 
should be noted also that a different choice of coset representatives S; = s;t;, where 


tj € H(i =1,...,n), yields an equivalent representation, since 
A(t)! ee 0 : Ath) 4+ 0 
ate bes ne A(s) ees ae 
0 ‘ae (0) bee A 
A(s, ‘ss') vee A(si ‘ssi, 
Als ss) s+ A(si ls 


Furthermore, changing the order of the cosets corresponds to performing the same 
permutation on the rows and columns of A(s), and thus also yields an equivalent rep- 
resentation. 

It follows that if y is the character of the original representation o of H, then the 
character y of the induced representation o of G is given by 


y(s) = >) w(s; 'ssi), 
i=l 


where we set w(s) = Oif s ¢ H.If H is of finite order h, this can be rewritten in the 
form 


w(s)=h"! D) wurtsu), (12) 


ueG 


since y(t~!s—'ss;t) = w(s7'ssi) ifte H. 

From any representation of a group G we can also obtain a representation of a 
subgroup H simply by restricting the given representation to H. We will say that 
the representation of H is deduced from that of G. There is a remarkable reciprocity 
between induced and deduced representations, discovered by Frobenius (1898): 


Proposition 13 Let p: s > A(s) be an irreducible representation of the finite group 
Gando: t > B(t) an irreducible representation of the subgroup H. Then the number 
of times that o occurs in the representation of H deduced from the representation p of 
G is equal to the number of times that p occurs in the representation of G induced by 
the representation o of H. 
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Proof Let x denote the character of the representation p of G and y the character of 
the representation o of H. By (7), the number of times that p occurs in the complete 
reduction of the induced representation @ is 


gd w(s)x(s7!) = (gh) >) wurtsu)z(s7'). 


seG 5,uEG 


If we put u—'s~'u = t, u7! = v, then s~! = v7!tv and (¢, v) runs through all 


elements of G x G at the same time as (s, uw). Therefore 


1 1 


gS ws)x(s7!) = (gh)! SS xT) (7!) 


seG t,vEeG 


=ht SD xOve') =A Dd xOve), 


teG teH 


which is the number of times that o occurs in the complete reduction of the restriction 
of p to A. 


Corollary 14 Each irreducible representation of a finite group G is contained in a 
representation induced by some irreducible representation of a given subgroup H. 


A simple, but still significant, application of these results is to the case where the 
order of the subgroup H is half that of the whole group G. The subgroup 4 is then nec- 
essarily normal (as defined in Chapter I, 87) since, for any v € G\H, the elements of 
G\H form both a single left coset oH anda single right coset Hv. Hence if s > A(s) 
is a representation of H, then so also is s > A(v~!sv), its conjugate representation. 
Since v* € H, the conjugate of the conjugate is equivalent to the original representa- 
tion. Evidently a representation is irreducible if and only if its conjugate representation 
is irreducible. 

On the other hand G has a nontrivial character 2 of degree 1, defined by 


2(s) = 1 or —1 according ass € Hors ¢ H. 


If y is an irreducible character of G, then the character y/ of the product representa- 
tion is also irreducible, since 


L= 87 DS x)x671) = Do x) x97 )AG7?). 


seG seG 


Evidently y and 72 have the same degree. 
If y; is the character of an irreducible representation of H, we will denote by wy? 
the character of its conjugate representation. Thus 


VP (8) = wi(o!s0). 


The representation and its conjugate are equivalent if and only if y?(s) = y(s) for 
every s € H. 
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Consider now the induced representation y; of G. Since H is a normal subgroup, 
it follows from (12) that 


yi(s) =y(s)=0 ifs e G\H, 
wils) = y?(s) = wils) + w2(s) ifs eH. 


Hence yj = yr? and 


DY Mwis) = Dotwils) + wP HiT) + wr 7} 


seG seH 
= Duwi )+ dX wrOywro" 
seH seH 
+ Dwi? 87) + wis) yw? I. 
seH 


Consequently, by the orthogonality relations for H, 


DY Mw!) = 2h +2 wily? 7). 


seG seH 


If y; and wy? are inequivalent, the second term on the right vanishes and we obtain 


> wilswi(s') = 8. 


sEG 


Thus the induced representation y; of G is irreducible, its degree being twice that 
of Wie 
On the other hand, if y; and y? are equivalent, then 


> wils)yi(sg) = 2g. 


seG 


If yj = oS j mjXj; 1s the decomposition of y; into irreducible characters x j of G, it fol- 
lows from (8) that >° : m* = 2. This implies that y; decomposes into two inequivalent 


irreducible characters of G, say yi = 7x + x1. We will show that in fact yj = yxA. 
If ve (s) = 0 for all s ¢ H, then 


>, xe) xu5) = D7 a) x57!) = 8 = 2h 


seH seG 


and hence, by the same argument as that just used, the restriction of yz to H decom- 
poses into two inequivalent irreducible characters of H. Since the restriction of Wi to 
H is 2y;, this is a contradiction. We conclude that y;,(s) 4 0 for some s ¢ H, ie. 
xa F Ax. Since x occurs once in the decomposition of y;, and y;(s) = Oif s ¢ H, 
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l=g" 2D, wee) 
= as wils) xe(s~') 
=e? > yils)xe(s')A(s7!) 
=g! S wils)xe(s')A(sT!). 
seG 


Thus y;4 also occurs once in the decomposition of Wis and since yzA4 ~ yx we must 
have y.4 = y1. 

In the relation >°, yi (1)? = h, partition the sum into a sum over pairs of distinct 
conjugate characters and a sum over self-conjugate characters: 


S"(wil)? + yPQ)} + 2" yi)? = A. 
Then for the corresponding characters of G we have 
E'ywily + "Ce? + 1)" = 2E"Cyi (1)? + v7? Q)"} + 2E" wi? = 2h = g. 


Since, by Corollary 14, each irreducible character of G appears in the sum on the left, 
it follows from (9) that each occurs exactly once. Thus we have proved 


Proposition 15 Let the finite group G have a subgroup H of half its order. Then each 
pair of distinct conjugate characters of H yields by induction a single irreducible char- 
acter of G of twice the degree, whereas each self-conjugate character of H yields by 
induction two distinct irreducible characters of G of the same degree, which coincide 
on Hi and differ in sign on G\ H. The irreducible characters of G thus obtained are all 
distinct, and every irreducible character of G is obtained in this way. 


We will now use Proposition 15 to determine the irreducible characters of several 
groups of mathematical and physical interest. Let .%,, denote the symmetric group con- 
sisting of all permutations of the set {1,2,..., m}, % the alternating group consisting 
of all even permutations, and C,, the cyclic group consisting of all cyclic permutations. 
Thus .Y, has order n!, .%, has order n!/2 and C,, has order n. 

The irreducible characters of the abelian group <4 = C3 are all of degree | and 
can be arranged as a table in the following way, where a is a primitive cube root of 
unity, say o = e27'/3 — (—1 +i./3)/2. 


A 
e (123) (132) 
WI 1 1 1 
y2 1 o wo 
3 1 wo o 


The group .“% contains .% as a subgroup of index 2. The elements of .“3 form three 
conjugacy classes: @ containing only the identity element e, @ containing the three 
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elements (12),(13),(23) of order 2, and 63 containing the two elements (123),(132) of 
order 3. The irreducible character yw; of &% is self-conjugate and yields two irreducible 
characters of .“; of degree 1, the trivial character y; and the sign character y2 = yA. 
The irreducible characters w2, y3 of <% are conjugate and yield a single irreducible 
character y3 of “3 of degree 2. Thus we obtain the character table: 


S3 
6 € 63 
a 4 1 4 
a Ce | 
em ie 2a 


The elements of <%4 form four conjugacy classes: @ containing only the iden- 
tity element e, G2 containing the three elements ft) = (12)(34), t2 = (13)(24), B = 
(14)(23) of order 2, G3 containing four elements of order 3, namely c, cty, Cfo, cfs, 
where c = (123), and @4 containing the remaining four elements of order 3, namely 
2, Ct, C7 to, c2t3. Moreover N = @ U @ is a normal subgroup of order 4, H = 
{e,c,c}isa cyclic subgroup of order 3, and 


If v is a character of degree | of H, then a character y of degree | of .% is defined by 
w(hn) = x(h) forallhe H,neN. 


Since H is isomorphic to .%, we obtain in this way three characters y1, w2, w3 of &%4 
of degree 1. Since ™ has order 12, and 12 = 14+ 1+ 1+ 9, the remaining 
irreducible character w4 of 24 has degree 3. The character table of 4 can be 
completed by means of the orthogonality relations (11) and has the following form, 
where again w = (—1 +iV/3)/2. 


AA 
|G | 1 3 4 4 
C € C4 C Ca 
WI 1 1 1 1 
y2 1 1 @o @ 
3 1 1 w @ 
wW4 3 —1 0 0 


The group .“4 contains .% as a subgroup of index 2 and v = (12) € .%\.%. The 
elements of .%4 form five conjugacy classes: @ containing only the identity element 
e, G2 containing six transpositions (jk) (1 < j < k < 4), ® containing the three 
elements of order 2 in .c4, G4 containing eight elements of order 3, and @s containing 
six elements of order 4. 

The self-conjugate character yw; of % yields two characters of .%4 of degree 1, 
the trivial character 7; and the sign character y2 = y1/; the pair of conjugate char- 
acters yo, w3 Of &% yields an irreducible character 73 of .%4 of degree 2; and the 
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self-conjugate character w4 of % yields two irreducible characters ya, 75 of “4 of 
degree 3. The rows of the character table corresponding to y4, 75 must have the form 


and from the orthogonality relations (11) we obtain z = —1, w = 0, xy = —1. From 
the orthogonality relations (10) we further obtain x + y = 0. Hence x* = 1 and the 
complete character table is 


1 -l 0 -!l 
-1 -l 0 1 


& 
i) 
wWBwWNe ER 
| 
a 
_ 
a 
| 
a 


The physical significance of these groups derives from the fact that .%4 (resp. 4) 
is isomorphic to the group of all rotations (resp. orthogonal transformations) of R* 
which map a regular tetrahedron onto itself. Similarly <% (resp. 3) is isomorphic to 
the group of all plane rotations (resp. plane rotations and reflections) which map an 
equilateral triangle onto itself. 

An important property of induced representations was proved by R. Brauer (1953): 
each character of a finite group is a linear combination with integer coefficients (not 
necessarily non-negative) of characters induced from characters of elementary sub- 
groups. Here a group is said to be elementary if it is the direct product of a group 
whose order is a power of a prime and a cyclic group whose order is not divisible by 
that prime. 

It may be deduced without difficulty from Brauer’s theorem that, if G is a finite 
group and m the least common multiple of the orders of its elements, then (as had long 
been conjectured) any irreducible representation of G is equivalent to a representation 
in the field Q(e?‘/). Green (1955) has shown that Brauer’s theorem is actually best 
possible: if each character of a finite group G is a linear combination with integer coef- 
ficients of characters induced from characters of subgroups belonging to some family 
F , then each elementary subgroup of G is contained in a conjugate of some subgroup 
in F. 


7 Applications 


Character theory has turned out to be an invaluable tool in the study of abstract groups. 
We illustrate this by two results of Burnside (1904) and Frobenius (1901). It is remark- 
able, first that these applications were found so soon after the development of character 
theory and secondly that, one century later, there are still no proofs known which do 
not use character theory. 
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Lemma 16 /f p: s > A(s) is a representation of degree n of a finite group G, then 
the character y of p satisfies 


I~(s)| <n foranys € G. 
Moreover, equality holds for some s if and only if A(s) = wIn, where w € C. 
Proof Ifs € G has order m, there exists an invertible matrix T such that 
T~'A(s)T = diagla,..., @p], 
where 1, ..., @, are m-th roots of unity. Hence y(s) = @ +-+-+@, and 
Ix(s)| < lar] +--+ lo@n| =n. 


Moreover |y(s)| = only if @,...,@p all lie on the same ray through the origin 
and hence only if they are all equal, since they lie on the unit circle. But then 
A(s) =oln. 


The kernel of the representation p is the set K, of all s € G for which p(s) = In. 
Evidently K, is a normal subgroup of G. By Lemma 16, K, may be characterized as 
the set of all s € G such that y(s) =n. 


Lemma 17 Let p: s > A(s) be an irreducible representation of degree n of a finite 
group G, with character x, and let © be a conjugacy class of G containing h elements. 
Ifh and n are relatively prime then, for any s € @, either y(s) = 0 or A(s) = ol, for 
somewae C, 


Proof Since h and n are relatively prime, there exist integers a, b such that ah+ bn = 
1. Then 


x(s)/n = ahyz(s)/n + bx(s). 


Since hy(s)/n and y(s) are algebraic integers, it follows that y(s)/n is an algebraic 
integer. We may assume that |y(s)| < n, since otherwise the result follows from 
Lemma 16. 

Suppose s has order m. If (k, m) = 1, then the conjugacy class containing s* also 
has cardinality A and thus y(s*)/n is an algebraic integer, by what we have already 
proved. Hence 


a=[][x6')/n, 
k 
where & runs through all positive integers less than m and relatively prime to m, is also 
an algebraic integer. But y(s*) = f(w*), where is a primitive m-th root of unity and 
FG) ax peep 


for some non-negative integers r),..., 7, less than m. Thus @ is a symmetric function 
of the primitive roots w*. Since the cyclotomic polynomial 
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®, (x) = [ [x - 0%) 
k 


has integer coefficients, it follows that a € Q. Consequently, by Lemma 12, a € Z. 

But |a| < 1, since |y(s)| <n and |y(s*)| <n for every k. Hence a = 0, and thus 
x(s*) = 0 for some k with (k,m) = 1. If g(x) is the monic polynomial in Q[x] of 
least positive degree such that g(w*) = 0, then any polynomial in Q[x] with w* as a 
root must be divisible by g(x). Since we showed in Chapter II, §5 that the cyclotomic 
polynomial @, (x) is irreducible over the field Q, it follows that g(x) = ®,(x) and 
that ®, (x) divides f(x). Hence also 7(s) = f(@) = 0. 


Before stating the next result we recall from Chapter I, §7 that a group is said to 
be simple if it contains more than one element and has no nontrivial proper normal 
subgroup. 


Proposition 18 [fa finite group G has a conjugacy class © of cardinality p“, for some 
prime p and positive integer a, then G is not a simple group. 


Proof Ifs € @ then, by (9), 


Sonu xuls) = 0. 


Lt 


Assume the notation chosen so that y; is the character of the trivial representation. 
If yu (s) = O for every w > 1 for which p does not divide n,,, then the displayed 
equation has the form 1 + p¢é = 0, where ¢ is an algebraic integer. Since —1/p is not 
an integer, this contradicts Lemma 12. Consequently, by Lemma 17, for some v > 1 
we must have A“) (s) = wly,, where w € C. The set K, of all elements of G which 
are represented by the identity transformation in the v-th irreducible representation is a 
normal subgroup of G. Moreover K, # {e}, since Ky contains all elements u—!s~!us, 


and K, # G, since v > |. Thus G is not simple. 


Corollary 19 If G is a group of order p“q?, where p, q are distinct primes and a, b 
non-negative integers such that a + b > 1, then G is not simple. 


Proof Let @,...,G; be the conjugacy classes of G, with @ = {e}, and let hx be the 
cardinality of @& (k = 1,...,1r). Then Ax divides the order g of G and 


g=ht+---+h,. 


Suppose first thath; = 1 for some j > 1. Then @; = {s;}, where s; commutes 
with every element of G. Thus the cyclic group H generated by s; is a normal sub- 
group of G. Then G is not simple even if H = G, since a+ b > 1 and any proper 
subgroup of a cyclic group is normal. 

Suppose next that hy # 1 for every k > 1. If G is simple then, by Proposition 18, 
q divides hx for every k > 1. Since qg divides g, it follows that q divides hy = 1, which 
is a contradiction. 
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It has been shown by Kazarin (1990) that the normal subgroup generated by the 
elements of the conjugacy class @ in Proposition 18 is solvable. Although no proof of 
Burnside’s Proposition 18 is known which does not use character theory, Goldschmidt 
(1970) and Matsuyama (1973) have given a rather intricate proof of the important 
Corollary 19 which is purely group theoretic. 

The restriction to two distinct primes in the statement of Corollary 19 is essential, 
since the alternating group .% of order 60 = 27 -3-5 is simple. It follows at once from 
Corollary 19, by induction on the order, that any finite group whose order is divisible 
by at most two distinct primes is solvable. P. Hall (1928/1937) has used Corollary 19 to 
show that a finite group G of order g is solvable if and only if G has a subgroup H of 
order / for every factorization g = p“h, where a > 0 and p is a prime not dividing h. 

The second application of group characters, due to Frobenius, has the following 
statement: 


Proposition 20 [f the finite group G has a nontrivial proper subgroup H such that 
x 'HxNH = {e} foreveryx € G\H, 


then G contains a normal subgroup N such that G is the semidirect product of H and 
N, te. 


G=NH, HNN=({e}. 


Proof Obviously x-'Hx = y~'Hy if y € Hx and the hypotheses imply that 
x-'Hx y7'Hy = {e} if y ¢ Hx. If g,h are the orders of G, H respectively, it 
follows that the number of distinct conjugate subgroups x~! Hx (including H itself) is 
n = g/h. Furthermore the number of elements of G which belong to some conjugate 
subgroup is n(h — 1) + 1 = g — (n — 1). Thus the set S$ of elements of G which do 
not belong to any conjugate subgroup has cardinality n — 1. 

Let y, be the character of an irreducible representation of H and Wu the character 
of the induced representation of G. By (12) and the hypotheses, 


Yule) =nyyle), Yuls)=0 ifseS, wuls)=Yyuls) ifs e A\e. 
For any fixed «, form the class function 
A= Wu = wuleiyi — xi}, 


where yw; and y are the characters of the trivial representations of H and G respec- 
tively. Then y is a generalized character of G, i.e. y = 1, myZy is a linear com- 
bination of irreducible characters y, with integral, but not necessarily non-negative, 
coefficients m,,. Moreover 


Xe) = wule), x)= ywule) ifseS, yx(s)=wuls) ifs € A\e. 
Hence 


», x(s)x(s7!) = p> Wuls)Wu(s—') =h- Wyle). 


seH\e seH\e 
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Since S has cardinality n — 1, it follows that 


D> x0)x671) = nth = wule)?} + yale)? + a — Dyyle)? = 8. 
seG 


But the formula (8) holds also for generalized characters. Since x (e) > 0, we conclude 
that y is in fact an irreducible character of G. Thus we have an irreducible representa- 
tion of degree y(e) in which the matrices representing elements of S have trace y (e). 
The elements of S must therefore be represented by the unit matrix, i.e. they belong to 
the kernel K,,, of the representation. 

On the other hand, for any t € H\e we have 


rs Yule)Wult) = 0 


Le 


and hence yw, (t) # Wy(e) for some yw. Thus the intersection of the kernels K,, for 
varying 4 contains just the elements of S and e. Since K, is a normal subgroup of G, 
it follows that VN = SU {e} is also a normal subgroup. Furthermore, since HNN = {e}, 
FIN has cardinality hn = g andhence HN = G. 


A finite group G which satisfies the hypotheses of Proposition 20 is said to be a 
Frobenius group. The subgroup H is said to be a Frobenius complement and the normal 
subgroup WN a Frobenius kernel. It is readily shown that a finite permutation group is a 
Frobenius group if and only if it is transitive and no element except the identity fixes 
more than one symbol. Another characterization follows from Proposition 20: a finite 
group G is a Frobenius group if and only if it has a nontrivial proper normal subgroup 
N such that, if x € N andx # e, thenxy ¥ yx forall ye G\N. 

Frobenius groups are of some general significance and much is known about their 
structure. It is easily seen that h divides n — 1, so that the subgroups H and N have rel- 
atively prime orders. It has been shown by Thompson (1959) that the normal subgroup 
N is a direct product of groups of prime power order. The structure of H is known 
even more precisely through the work of Burnside (1901) and others. 

Applications of group characters of quite a different kind arise in the study of mole- 
cular vibrations. We describe one such application within classical mechanics, due to 
Wigner (1930). However, there are further applications within quantum mechanics, 
e.g. to the determination of the possible spectral lines in the Raman scattering of light 
by a substance whose molecules have a particular symmetry group. 

A basic problem of classical mechanics deals with the small oscillations of a sys- 
tem of particles about an equilibrium configuration. The equations of motion have the 
form 

Bx+Cx =0, (13) 


where x € R” is a vector of generalized coordinates and B, C are positive definite real 
symmetric matrices. In fact the kinetic energy is (1/2)x' Bx and, as a first approxima- 
tion for x near 0, the potential energy is (1/2)x'Cx. 

Since B and C are positive definite, there exists (see Chapter V, §4) a non-singular 
matrix T such that 


T'BT=I1, T'CT=D, 
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where D is a diagonal matrix with positive diagonal elements. By the linear transfor- 
mation x = Ty the equations of motion are brought to the form 


y+ Dy =0. 
These ‘decoupled’ equations can be solved immediately: if 
y=(m,..-,%m)', D=diag[w?,..., 07], 
with @ > 0(k =1,...,7), then 
Nk = AK COS opt + BE SiN Ot, 


where ax, Pe(k = 1,...,) are arbitrary constants of integration. Hence there exist 
vectors ax, by € R” such that every solution of (13) is a linear combination of solu- 
tions of the form 


agpcosa@zt, besina,xt (k=1,...,n), 


the so-called normal modes of oscillation. The eigenvalues of the matrix B~!C are the 
squares of the normal frequencies 1, ..., @n. 

An important example is the system of particles formed by a molecule of N atoms. 
Since the displacement of each atom from its equilibrium position is specified by three 
coordinates, the internal configuration of the molecule without regard to its position 
and orientation in space may be specified by n = 3N — 6 internal coordinates. The de- 
termination of the corresponding normal frequencies @ , ..., @, may be a formidable 
task even for moderate values of N. However, the problem is considerably reduced by 
taking advantage of the symmetry of the molecule. 

A symmetry operation is an isometry of R? which sends the equilibrium position 
of any atom into the equilibrium position of an atom of the same type. The set of all 
symmetry operations is clearly a group under composition, the symmetry group of the 
molecule. 

For example, the methane molecule C Hq has four hydrogen atoms at the vertices 
of a regular tetrahedron and a carbon atom at the centre, from which it follows that 
the symmetry group of C H4 is isomorphic to .“4. Similarly, the ammonia molecule 
NH; has three hydrogen atoms and a nitrogen atom at the four vertices of a regular 
tetrahedron, and hence the symmetry group of N H3 is isomorphic to .%. 

We return now to the general case. If G is the symmetry group of the molecule, 
then to each s € G there corresponds a linear transformation A(s) of the configuration 
space IR”. Moreover the map p: s > A(s) is a representation of G. Since the kinetic 
and potential energies are unchanged by a symmetry operation, we have 


A(s)'BA(s)=B, A(s)'CA(s)=C_ foreverys €G. 
It follows that 
B~'CA(s) = A(s)B7'C_ for every s € G. 


Assume the notation chosen so that the distinct w’s are @1,..., @p and wx occurs 
m times in the sequence @,...,@n (kK = 1,..., p). Thusn =m, +--+ + mp. If Vi 
is the set of all v € R” such that 


B'Co = or, 
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then V; is an mg-dimensional subspace of R”(k = 1,..., p) and R” is the direct sum 
of Vj, ..., Vp. Moreover each eigenspace V; is invariant under A(s) for every s € G. 
Hence, by Maschke’s theorem (which holds also for representations in a real vector 
space), V; is a direct sum of real-irreducible invariant subspaces. It follows that there 
exists a real non-singular matrix T such that, for every s € G, 


Ai(s) 0 Sas 0) 
Pa@ra| 9 were 2 ih, 
0 0 -++  Ag(s) 


where s > Ajx(s) is a real-irreducible representation of G, of degree ng say (k = 
1,...,q), and 


ATn, 0 vee 0 
| ee 
0 0 sss Aglng 
If the real-irreducible representations s > Axz(s) (k = 1,...,q) are also complex- 


irreducible, then their degrees and multiplicities can be found by character theory. 
Thus by decomposing the representation p of G into its irreducible components we 
can determine the degeneracy of the normal frequencies. 

We will not consider here the modifications needed when some real-irreducible 
component is not also complex-irreducible. Also, it should be noted that it may 
happen ‘accidentally’ that 2; = 2, for some j # k. 

As a simple illustration of the preceding discussion we consider the ammonia 
molecule N H3. Its internal configuration may be described by the six internal coor- 
dinates r}, 2,73 and a 23, 431, 12, where r; is the change from its equilibrium value 
of the distance from the nitrogen atom to the j-th hydrogen atom, and a ;, is the change 
from its equilibrium value of the angle between the rays joining the nitrogen atom to 
the j-th and k-th hydrogen atoms. 

We will determine the character y of the corresponding representation p of the 
symmetry group .“%. In the notation of the character table previously given for .%, 
there is an element s € G3 for which the symmetry operation A(s) cyclically permutes 
r1, 172,73 and a3, 431, @12. Consequently y(s) = Oif s € G3. Also, there is an element 
t € G for which the symmetry operation A(t) interchanges r; with r2 and a23 with 
a31, but fixes rz and a 12. Consequently y(t) = 2 if t © G. Since it is obvious that 
x (e) = 6, this determines y and we adjoin it to the character table of .~: 


x 

ANH“ XK 
= 
= 
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Decomposing the character y into its irreducible components by means of (7), 
we obtain y = 27; + 273. Since the irreducible representations of .“; are all real, 
this means that the configuration space R° is the direct sum of four irreducible invari- 
ant subspaces, two of dimension | and two of dimension 2. Knowing what to look 
for, we may verify that the one-dimensional subspaces spanned by r) + r2 + r3 and 
2.23 + 431 + a2 are invariant. Also, the two-dimensional subspace formed by all vec- 
tors “ir, + More + w3r3 with wy + “2 + “3 = O is invariant and irreducible, and 
so is the two-dimensional subspace formed by all vectors v}a23 + v2a@31 + v3a12 With 
vy + v2 + v3 = 0. Hence we can find a real non-singular matrix T such that 


Ai, 0 0 0) 
0 Aol 0 0 
0) 0 A31h 0) 
0 0 0 A4h 


TB CT = 


This shows that the ammonia molecule N H3 has two nondegenerate normal frequen- 
cies and two doubly degenerate normal frequencies. 
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During the past century the character theory of finite groups has been extensively gen- 
eralized to infinite groups with a topological structure. It may be helpful to give an 
overview here, without proofs, of this vast development. The reader wishing to pursue 
some particular topic may consult the references at the end of the chapter. 

A topological group is a group G with a topology such that the map (s,t) > st7! 
of G x G into G is continuous. Throughout the following discussion we will assume 
that G is a topological group which, as a topological space, is locally compact and 
Hausdorff, i.e. any two distinct points are contained in open sets whose closures are 
disjoint compact sets. (A closed set E in a topological space is compact if each open 
cover of E has a finite subcover. In a metric space this is consistent with the definition 
of sequential compactness in Chapter I, 4.) 

Let Go(G) denote the set of all continuous functions f: G — C such that 
f(s) = 0 for all s outside some compact subset of G (which may depend on f/f). 
A map M: G(G) > Cis said to be a nonnegative linear functional if 


G@) M(fi + f2) = M(fi) + M(f2) forall fi, f2 € Go(G), 
(ii) MUf) = AM(f) forall 2 € C and f € G(G), 
(ii) M(f) > Oif f(s) > 0 for every s € G. 


It is said to be a left (resp. right) Haar integral if, in addition, it is nontrivial, i.e. 
M(f) 4 0 for some f € Go(G), and left (resp. right) invariant, i.e. 


(iv) M(, f) = M(f) for every t € G and f € G(G), where ; f(s) = f(t7's), (resp. 
M(f:) = M(f) for every t € G and f € Go(G), where f;(s) = f(st)). 


It was shown by Haar (1933) that a left Haar integral exists on any locally 
compact group; it was later shown to be uniquely determined apart from a positive 
multiplicative constant. By defining M*(f) = M(f*), where f*(s) = f(s7!) for 
every s € G, it follows that a right Haar integral also exists and is uniquely determined 
apart from a positive multiplicative constant. 
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The notions of left and right Haar integral obviously coincide if the group G is 
abelian, and it may be shown that they also coincide if G is compact or is a semi- 
simple Lie group. 

We now restrict attention to the case of a left Haar integral. It is easily seen that 


M(f) = M(f), 


where f(s) = f(s) for every s € G. If we set (f, g) = M(f), then the usual inner 
product properties hold: 


(fi + fo, 8) = (fi, 8) + (fr. 8), 
Af, 8) =4(, 8), 


(ie) = et): 
(f, f) = 0, with equality only if f = 0. 
By the Riesz representation theorem, there is a unique positive measure ju on the 
o-algebra .@ generated by the compact subsets of G (cf. Chapter XI, §3) such that 


uU(K) is finite for every compact set K C G, u(E) is the supremum of “(K) over all 
compact K C E foreach E € .@, and 


M(f) = | fan for every f € Go(G). 


The measure yu is necessarily left invariant: 
H(E) = u(sE) forallE ¢ Wands €G, 


where sE = {sx: x € E}. 
For p = 1 or 2, let L?(G) denote the set of all .-measurable functions f: G > C 
such that 


| Ifl?du < oo, 
G 
The definition of M can be extended to L!(G) by setting 
M(f) = | fay, 
G 
and the inner product can be extended to L?(G) by setting 
(a= | fede. 


Moreover, with this inner product L*(G) is a Hilbert space. If we define the convolu- 
tion product f * g of f,g € L'(G) by 


frews [ F(st)g7!) du(t), 
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then L'(G) is a Banach algebra and 


M(f «g)=M(f)M(g) forall f, g € L'(G). 


A unitary representation of G in a Hilbert space # is a map p of G into the set 
of all linear transformations of # which maps the identity element e of G into the 
identity transformation of #7: 


ple) = 1, 
which preserves not only products in G: 
p(st) = p(s)p(t) foralls,t eG, 
but also inner products in .#: 
(p(s)u, p(s)v) = (u,v) foralls € Gandallu,v € #, 


and for which the map (s,v) > p(s)v of G x # into # is continuous (or, equiv- 
alently, for which the map s > (p(s)v,v) of G into C is continuous at e for every 
vEeH). 

For example, any locally compact group G has a unitary representation p in L?(G), 
its regular representation, defined by 


(p(t) f)(s) = f(t~'s) forall f € L7(G) andalls,t eG. 


If p is a unitary representation of G in a Hilbert space .#%, and if a closed subspace 
V of # is invariant under p(s) for every s € G, then so also is its orthogonal comple- 
ment V+. The representation p is said to be irreducible if the only closed subspaces of 
A which are invariant under p(s) for every s € G are # and {0}. It has been shown 
by Gelfand and Raikov (1943) that, for any locally compact group G and any s € G\e, 
there is an irreducible unitary representation p of G with p(s) ¥ I. 

Consider now the case in which the locally compact group G is abelian. Then 
any irreducible unitary representation of G is one-dimensional. Hence if we define a 
character of G to be a continuous function y : G > C such that 


(i) x(st) = x(s)x (ft) for alls, t € G, 
(ii) |v (s)| = 1 for every s € G, 


then every irreducible unitary representation is a character, and vice versa. 

If multiplication and inversion of characters are defined pointwise, then the set G 
of all characters of G is again an abelian group, the dual group of G. Moreover, we 
can put a topology on G by defining a subset of G to be open if it is a union of sets of 
the form 


Niwe, K) = {x € G: ly (s)/w(s) — 1| < e forall s € K}, 


where y € G,e > Oand K isa compact subset of G. Then G is not only abelian, but 
also a locally compact topological group. 
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For each fixed s € G, the map S: y — y(s) is a character of G. Moreover the map 
Ss — S is one-to-one, by the theorem of Gelfand and Raikov, and every character of 
G is obtained in this way. In fact the duality theorem of Pontryagin and van Kampen 
(1934/5) states that G is isomorphic and homeomorphic to the dual group of G. 

The Fourier transform of a function f € L'(G) is the function f : G > Cdefined 
by 


nee [ f)z@ du), 


where yu is the Haar measure on G. If fi, fo € L'(G) N L7(G), then fA h € 17(G) 
and, with a suitable fixed normalization of the Haar measure / on G, 


(fi. Ado = (A. Ade- 
Furthermore, the map f > f can be uniquely extended to a unitary map of L?(G) 
onto L*(G). This generalizes Plancherel’s theorem for Fourier integrals on the real 
line. 
If f = g *h, where g,h € L'(G), then f € L'(G) and 
f(x) = &COh(q)_ forevery x € G. 


If, in addition, g,h € L*(G), then ‘a er (G) and, with the same choice as before for 
the Haar measure ji on G, the Fourier inversion formula holds: 


i= i, Poroanw: 


The Poisson summation formula can also be extended to this general setting. Let 
H be a closed subgroup of G and let K denote the factor group G/H. If the Haar 
measures ,0 on H, K are suitably chosen then, with appropriate hypotheses on 
feLl'@, 


| FO du(t) = [ fw an. 
H K 


We now give some examples (without spelling out the topologies). If G = R is the 
additive group of all real numbers, then its characters are the functions y;: R > C, 
with t € R, defined by 

ut ( s) = elts ; 


In this case G is isomorphic and homeomorphic to G itself under the mapt —> y;. The 
Haar integral of f € L'(G) is the ordinary Lebesgue integral 


M(f) = a f(s) ds, 
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the Fourier transform of f is 


762 i Flsye*as, 


and the Fourier inversion formula has the form 
io) A . 
f(s) = (1/2) i feat. 
—cCo 


If G = Z is the additive group of all integers, then its characters are the functions 
X21 Z— C, with z € C and |z| = 1, defined by 


420) = 2". 


Thus G is the multiplicative group of all complex numbers of absolute value 1. The 
Haar integral of f € L'(G) is 


M(f)= >. f(r), 


n=—Oo 


the Fourier transform of /f is 


fe) = dD) fe”, 


n=—OoO 


and the Fourier inversion formula has the form 


2a ro : 
F(n) = (/2n) i. Fle einbag. 


Thus the classical theories of Fourier integrals and Fourier series are just special 
cases. As another example, let G = Q, be the additive group of all p-adic numbers. 
The characters in this case are the functions y;: Q, — C, with t € Q», defined by 


ut (s) = grriatsh) 


where A(x) = Di,oxjp/ if x € Q, is given by x = 2s ip! eee 
{0,1,..., p—1} and x; = 0 forall large j < 0. Also in this case G is isomorphic and 
homeomorphic to G itself under the map t > y;. If we choose the Haar measure on 
G so that the measure of the compact set Zp of all p-adic integers is 1, then the same 
choice for G is the appropriate one for Plancherel’s theorem and the Fourier inversion 
formula. 

Consider next the case in which the group G is compact, but not necessarily 
abelian. In this case Go(G) coincides with the set @(G) of all continuous functions 
f:G-— C. The Haar integral is both left and right invariant, and we suppose it nor- 
malized so that the integral of the constant | has the value 1. Then the integral M(f) 
of any f € €(G), or L'(G), may be called the invariant mean of f. 
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It may be shown that if p is a unitary representation of a compact group G in 
a Hilbert space #, then .# may be represented as a direct sum # = @qH%, of 
mutually orthogonal finite-dimensional invariant subspaces .%, such that, for every a, 
the restriction of p to .#%, is irreducible. 

In particular, any irreducible unitary representation of a compact group is finite- 
dimensional. Consequently it is possible to talk about matrix elements and traces, 
i.e. characters, of irreducible unitary representations. The orthogonality relations for 
matrix elements and for characters of irreducible representations of finite groups 
remain valid for irreducible unitary representations of compact groups if one replaces 
g | > eG f(s) by the invariant mean M(f). 

Furthermore, any function f € @(G) can be uniformly approximated by fi- 
nite linear combinations of matrix elements of irreducible unitary representations, 
and any class function f € @(G) can be uniformly approximated by finite lin- 
ear combinations of characters of irreducible unitary representations. Finally, in the 
direct sum decomposition of the regular representation into finite-dimensional irre- 
ducible unitary representations, each irreducible representation occurs as often as its 
dimension. 

Thus the representation theory of compact groups is completely analogous to that 
of finite groups. Indeed we may regard the representation theory of finite groups as 
a special case, since any finite group is compact with the discrete topology and any 
representation is equivalent to a unitary representation. 

An example of a compact group which is neither finite nor abelian is the group 
G = SU (2) of all 2 x 2 unitary matrices with determinant 1. The elements of G have 


the form 
_{[y 4 


where y, 6 are complex numbers such that |y |? + |d|7 = 1. Writing y = & +13, 
0 =, + id, we see that topologically SU (2) is homeomorphic to the sphere 


S==@ 64 6}ER: G+G+G+G=h 


and hence is compact and simply-connected (i.e. it is path-connected and any closed 
path can be continuously deformed to a point). 

For any integer n > 0, let V, denote the vector space of all polynomials f (z1, z2) 
with complex coefficients which are homogeneous of degree n. Writing z = (z1, Z2), 
we have 


zg = (yz1 — 622, 021 + 722). 


Hence if we define a linear transformation T, of V, by (Tz f)(z) = f(zg), then 
Pn: & — Tz is a representation of SU(2) in V,. It may be shown that this repre- 
sentation is irreducible and is unitary with respect to the inner product 


n 


n n 
(> axzizs *, > Actes *) = DIM a — blab. 
k=0 k=0 


k=0 
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Moreover, every irreducible representation of SU(2) is equivalent to py, for some 
n>0. 
To determine the character y, of p, we observe that any g € G is conjugate in G 


to a diagonal matrix 
el? 0 
— ( “o) : 
0 e 


where 0 € R. If fe(z1, 22) = 2423 *(O < k <n), then 


(Ti fe) 1. 22) = ('%z1)* (6 22)" * = eh 1, 22). 
Since the polynomials fo, ..., f; are a basis for V,, it follows that 
n 
Xn(8) = Anlt) = Deh”, 
k=0 
Thus 7,77) =n + 1, yn(—L) = (—1)"(n + 1) and 
ynlg) = (eleth8 — e494 11619 _ 6-9) = sin(n + 1)0/ sind if g £ I, —I. 


From this formula we can easily deduce the decomposition of the product repre- 
sentation Pm ® Pn into irreducible components. Since 


Am (2) ¥n(g) = (e'”? ae ei n—2)0 AS Soca cal eee = eee er? _ ery 
= Xm+n(g) + Xm+n—2(g) eet Xn—n|\(8). 


we have the Clebsch—Gordan formula 


Pm & Pn = Pmtn + Pmtn—-2 + +++ + Plm=n|- 


This formula is the group-theoretical basis for the rule in atomic physics which 
determines the possible values of the angular momentum when two systems with given 
angular momenta are coupled. 

The complex numbers y,6 with |y|* + [5]? = 1 which specify the matrix 
g € SU(2) can be uniquely expressed in the form 


y = lV)? cos9/2, 5 = el ¥—9)/? sin 9/2, 


where 0 < 0 < 2,0 <@ < 2a, -—2a < w < 2z. Then the invariant mean of any 
continuous function f: SU(2) > C is given by 


2a 2a 1 
mcf) = aston?) | | [ £0.0. sino a0 do ay, 


Another example of a compact group which is neither finite nor abelian is the group 
SO(3) of all 3 x 3 real orthogonal matrices with determinant 1. The representations 
of SO(3) may actually be obtained from those of SU(2), since the two groups are 
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intimately related. This was already shown in 86 of Chapter I, but another version of 
the proof will now be given. 
The set V of all 2 x 2 matrices v which are skew-Hermitian and have zero trace, 


v= ( ae 4 , where Za =0, 


is a three-dimensional real vector space which may be identified with R* by 
writing @ = 163, B = + ig. Any g € G = SU(2) defines a linear transformation 
Tg: v0 > gvg—! of R?. Moreover Tz is an orthogonal transformation, since if 


reane(3, 2) 


then, by the product rule for determinants, 
Jail? + [Ai = lal? + IBP. 


Hence det T, = +1. In fact, since T, is a continuous function of g and SU(2) is 
connected, we must have det 7, = det 7, = 1 for every g € G. Thus T, € SO(3). 
Since Tgy, = T,7p, the map g — T;, is a representation of G. 

Every element of SO(3) is represented in this way, since 


o-io/2 0 cosg sing 0O 
if 29 = ( 0 gon) then T,, = Bp = | —sing cosg O], 
0 0 1 


1 0 0 


ifhg = ee i oo) then T;,, =Co= {0 cos@ sind], 
pane) a 0 -—-—sind cosé 


and every A € SO(3) can be expressed as a product A = B, Cg By, where 9, 0, y are 
Euler’s angles. 

If 7, = J is the identity matrix, i.e. if go = vg for every v € V, then g = th, 
since any 2 x 2 matrix which commutes with both the matrices 


(Sr 0-0 4) 


must be a scalar multiple of the identity matrix. It follows that SO (3) is isomorphic to 
the factor group SU (2)/{£h}. 

These examples, and higher-dimensional generalizations, can be treated systemat- 
ically by the theory of Lie groups. A Lie group is a group G with the structure of a 
finite-dimensional real analytic manifold such that the map (x, y) > xy~! of G x G 
into G is real analytic. 

Some examples of Lie groups are 


(i) a Euclidean space R” under vector addition; 
(ii) an n-dimensional torus (or n-torus) T", i.e. the direct product of n copies of the 
multiplicative group T! of all complex numbers of absolute value 1; 
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(iii) the general linear group GL(n) of all real nonsingular n xn matrices under matrix 
multiplication; 

(iv) the orthogonal group O(n) of all matrices X € GL(n) such that X'X = I; 

(v) the unitary group U(n) of all complex n x n matrices X such that X*X = In, 
where X* is the conjugate transpose of X; (U(n) may be viewed as a subgroup of 
GL(2n)) 

(vi) the unitary symplectic group Sp(n) of all quaternion n x n matrices X such that 
X*X = I,, where X* is the conjugate transpose of X. (Sp(n) may be viewed as 
a subgroup of GL(4n)) 


The definition implies that any Lie group is a locally compact topological group. 
The fifth Paris problem of Hilbert (1900) asks for a characterization of Lie groups 
among all topological groups. A complete solution was finally given by Gleason, 
Montgomery and Zippin (1953): a topological group can be given the structure of a 
Lie group if and only if it is locally Euclidean, i.e. there is a neighbourhood of the 
identity which is homeomorphic to R” for some n. 

The advantage of Lie groups over arbitrary topological groups is that, by replacing 
them by their Lie algebras, they can be studied by the methods of /inear analysis. 

A real (resp. complex) Lie algebra is a finite-dimensional real (resp. complex) vec- 
tor space L with a map (u,v) > [u,v] of L x L into L, which is linear in wu and in v 
and has the properties 


(i) [v, v0] = O for every vo € L, 
Gi) [u, [v, w]] + [o, [w, u]] + [w, [u, o]] = 0 for all u,v, w € L. (Jacobi identity) 


It follows from (i) and the linearity of the bracket product that 
[u,v] +[v,u] =O forallu,o € L. 


An example of a real (resp. complex) Lie algebra is the vector space gl(n, IR) (resp. 
gl(n, C)) of all n x n real (resp. complex) matrices X with [X, Y] = XY — YX. Other 
examples are easily constructed as subalgebras. 

A Lie subalgebra of a Lie algebra L is a vector subspace M of L such that u « M 
and v € M imply [u, v] € M. Some Lie subalgebras of gl(n, C) are 


(i) the set A, of all X € gl(n+ 1, C) with tr X = 0, 
(ii) the set B, of all X € gl(2n + 1, C) such that X‘' + X =0, 
(iii) the set C, of all X € gl(2n, C) such that X‘ J + JX = 0, where 


(iv) the set D, of all X € gl(2n, C) such that X' + X =0. 


The manifold structure of a Lie group G implies that with each s € G there is asso- 
ciated a real vector space, the tangent space at s. The group structure of the Lie group 
G implies that the tangent space at the identity e of G is a real Lie algebra, which will 
be denoted by L(G). For example, if G = GL(n) then L(G) = gl(n, R). The proper- 
ties of Lie groups are mirrored by those of their Lie algebras in the following way. 
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For every real Lie algebra L, there is a simply-connected Lie group G such that 
L(G) = L. Moreover, G is uniquely determined up to isomorphism by L. A connected 
Lie group G has L(G) = L if and only if G is isomorphic to a factor group G /D, 
where D is a discrete subgroup of the centre of G. 

A Lie subgroup of a Lie group G is a real analytic submanifold H of G which 
is also a Lie group under the restriction to H of the group structure on G. It may be 
shown that a subgroup H of a Lie group G is a Lie subgroup if it is a closed subset of 
G, and is a connected Lie subgroup if and only if it is path-connected. Thus any closed 
subgroup of GL(n) is a Lie group. 

If H is a Lie subgroup of the Lie group G, then L(/Z) is a Lie subalgebra of L(G). 
Moreover, if M is a Lie subalgebra of L(G), there is a unique connected Lie subgroup 
H of G such that L(H) = M. 

If G1, G2 are Lie groups, then a map f : Gj — G2 is a Lie group homomorphism 
if it is an analytic map, regarding G1, G2 as manifolds, and a homomorphism, regard- 
ing G1, G2 as groups. It may be shown that any continuous map f : Gj — G2 which 
is a group homomorphism is actually a Lie group homomorphism. (It follows that a 
locally Euclidean topological group can be given the structure of a Lie group in only 
one way.) 

If L,, L2 are Lie algebras, then a map T : L; > Lz is a Lie algebra homomor- 
phism if it is linear and T[u, vo] = [Tu, To] for all u,v € Ly. If G1, G2 are Lie groups 
and if f : Gj — Gz is a Lie group homomorphism, then the derivative of f at the 
identity, f’(e) : L(G1) > L(G2), is a Lie algebra homomorphism. Moreover, if G is 
connected then distinct Lie group homomorphisms give rise to distinct Lie algebra ho- 
momorphisms, and if G, is simply-connected then every Lie algebra homomorphism 
L(G) > L(Gz2) arises from some Lie group homomorphism. (In particular, the rep- 
resentations of a connected Lie group are determined by the representations of its Lie 
algebra.) 

A Lie algebra L is abelian if [u,v] = 0 for all u,v € L. A connected Lie group 
is abelian if and only if its Lie algebra is abelian. Since the Euclidean space R” is 
a simply-connected Lie group with an n-dimensional abelian Lie algebra, it follows 
that any n-dimensional connected abelian Lie group is isomorphic to a direct product 
R'-* x Tk (where T* isa k-torus) for some k such that 0 < k <n. 

An ideal of a Lie algebra L is a vector subspace M of L such that u € L and 
v € M imply [u,v] € M. A connected Lie subgroup H of a connected Lie group G is 
anormal subgroup if and only if L(#) is an ideal of L(G). 

A Lie algebra L is simple if it has no ideals except {0} and L and is not one- 
dimensional, and semisimple if it has no abelian ideal except {0}. It may be shown that 
a Lie algebra is semisimple if and only if it is the direct sum of finitely many ideals, 
each of which is a simple Lie algebra. 

A Lie group is semisimple if it is connected and has no connected abelian normal 
Lie subgroup except {e}. It follows that a connected Lie group G is semisimple if and 
only if its Lie algebra L(G) is semisimple. 

We turn our attention now to compact Lie groups. It may be shown that a compact 
topological group can be given the structure of a Lie group if and only if it is finite- 
dimensional and locally connected. Furthermore, a compact Lie group is isomorphic 
to a closed subgroup of GL(n) for some n. Other basic results are: 
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(i) a compact Lie group, and even any compact topological group, has only finitely 
many connected components; 


(ii) a connected compact Lie group is abelian if and only if it is an n-torus T” for 
some 1; 


(iii) a semisimple connected compact Lie group G has a finite centre. Moreover the 
simply-connected Lie group G such that L(G) = L(G) is not only semisimple 
but also compact; 

(iv) an arbitrary connected compact Lie group G has the form G = ZH, where Z, H 
are connected compact Lie subgroups, H is semisimple and Z is the component 
of the centre of G which contains the identity e. 


These results essentially reduce the classification of arbitrary compact Lie groups 
to the classification of those which are semisimple and simply-connected. It may be 
shown that the latter are in one-to-one correspondence with the semisimple com- 
plex Lie algebras. Since a semisimple Lie algebra is a direct sum of finitely many 
simple Lie algebras, we are thus reduced to the classification of the simple com- 
plex Lie algebras. The miracle is that these can be completely enumerated: the 
non-isomorphic simple complex Lie algebras consist of the four infinite families 
An(n > 1), Bn(n = 2), Cr(a = 3), Dr(n = 4), of dimensions n(n + 2), n(2n + 1), 
n(2n+1), n(2n—1) respectively, and five exceptional Lie algebras G2, F4, Eo, E7, Es 
of dimensions 14, 52, 78, 133, 248 respectively. 


To the simple complex Lie algebra A, corresponds the compact Lie group 
SU(n + 1) of all matrices in U(n + 1) with determinant 1; to B, corresponds the 
compact Lie group SO(2n + 1) of all matrices in O(2n + 1) with determinant 1; to 
C,, corresponds the compact Lie group Sp(n) (whose matrices all have determinant 1), 
and to D, corresponds the compact Lie group SO(2n) of all matrices in O(2n) with 
determinant 1. The groups SU(n) and Sp(n) are simply-connected if n > 2, whereas 
SO(n) is connected but has index 2 in its simply-connected covering group Spin(n) if 
n > 5. The compact Lie groups corresponding to the five exceptional simple complex 
Lie algebras are all related to the algebra of octonions or Cayley numbers. 


Space does not permit consideration here of the methods by which this classifi- 
cation has been obtained, although the methods are just as significant as the result. 
Indeed they provide a uniform approach to many problems involving the classical 
groups, giving explicit formulas for the invariant mean and for the characters of all 
irreducible representations. There is also a notable connection with groups generated 
by reflections. 


The classification of arbitrary semisimple Lie groups reduces similarly to the clas- 
sification of simple real Lie algebras, which have also been completely enumerated. 
The irreducible unitary representations of non-compact semisimple Lie groups have 
been extensively studied, notably by Harish-Chandra. However, the non-compact case 
is essentially more difficult than the compact, since any nontrivial representation is 
infinite-dimensional, and the results are still incomplete. Much of the motivation for 
this work has come from elementary particle physics where, in the original formula- 
tion of Wigner (1939), a particle (specified by its mass and spin) corresponds to an 
irreducible unitary representation of the inhomogeneous Lorentz group. 
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9 Further Remarks 


The history of Legendre’s conjectures on primes in arithmetic progressions is 
described in Vol. I of Dickson [13]. Dirichlet’s original proof is contained in [33], 
pp. 313-342. Although no simple general proof of Dirichlet’s theorem is known, 
simple proofs have been given for the existence of infinitely many primes congruent 
to | mod m; see Sedrakian and Steinig [41]. 

If all arithmetic progressions a,a + m,... with (a,m) = 1 contain a prime, 
then they all contain infinitely many, since for any k > 1| the arithmetic progression 
a+m*,a+2m*,... contains a prime. 

It may be shown that any finite abelian group G is isomorphic to its dual group G 
(although not in a canonical way) by expressing G as a direct product of cyclic groups; 
see, for example, W. & F. Ellison [15]. 

In the final step of the proof of Proposition 7 we have followed Bateman [3]. Other 
proofs that L(1, vy) 4 0 for every y 4 71, which do not use Proposition 6, are given 
in Hasse [21]. The functional equation for Dirichlet L-functions was first proved by 
Hurwitz (1882). For proofs of some of the results stated at the end of §3, see Bach and 
Sorenson [1], Davenport [12], W. & F. Ellison [15] and Prachar [40]. Funakura [18] 
characterizes Dirichlet L-functions by means of their analytic properties. 

The history of the theory of group representations and group characters is de- 
scribed in Curtis [10]. More complete expositions of the subject than ours are given by 
Serre [42], Feit [16], Huppert [27], and Curtis and Reiner [11]. The proof given here 
that the degree of an irreducible representation divides the order of the group is not 
Frobenius’ original proof. It first appeared in a footnote of a paper by Schur (1904) on 
projective representations, where it is attributed to Frobenius. Zassenhaus [50] gives 
an interpretation in terms of Casimir operators. 

A character-free proof of Corollary 19 is given in Gagen [19]. P. Hall’s theorem is 
proved in Feit [16], for example. Frobenius groups are studied further in Feit [16] and 
Huppert [27]. 

For physical and chemical applications of group representations, see Cornwell [9], 
Janssen [29], Meijer [36], Birman [4] and Wilson et al. [48]. 

Dym and McKean [14] give an outward-looking introduction to the classical theory 
of Fourier series and integrals. The formal definition of a topological group is due to 
Schreier (1926). The Haar integral is discussed by Nachbin [37]. General introductions 
to abstract harmonic analysis are given by Weil [46], Loomis [34] and Folland [17]. 
More detailed information on topological groups and their representations is contained 
in Pontryagin [39], Hewitt and Ross [23] and Gurarii [20]. A simple proof that the ad- 
ditive group Q, of all p-adic numbers is isomorphic to its dual group is given by 
Washington [45]. In the adelic approach to algebraic number theory this isomorphism 
lies behind the functional equation of the Riemann zeta function; see, for example, 
Lang [31]. 

For Hilbert’s fifth problem, see Yang [49] and Hirschfeld [24]. The correspondence 
between Lie groups and Lie algebras was set up by Sophus Lie (1873-1893) in a purely 
local way, i.e. between neighbourhoods of the identity in the Lie group and of zero 
in the Lie algebra. Over half a century elapsed before the correspondence was made 
global by Cartan, Pontryagin and Chevalley. A basic property of solvable Lie algebras 
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was established by Lie, but we owe to Killing (1888-1890) the remarkable classifica- 
tion of simple complex Lie algebras. Some gaps and inaccuracies in Killing’s pioneer- 
ing work were filled and corrected in the thesis of Cartan (1894). The classification of 
simple real Lie algebras is due to Cartan (1914). The representation theory of semisim- 
ple Lie algebras and compact semisimple Lie groups is the creation of Cartan (1913) 
and Weyl (1925-7). The introduction of groups generated by reflections is due to Weyl. 

For the theory of Lie groups, see Chevalley [7], Warner [44], Varadarajan [43], 
Helgason [22] and Barut and Raczka [2]. The last reference also has information 
on representations of noncompact Lie groups and applications to quantum theory. 
The purely algebraic theory of Lie algebras is discussed by Jacobson [28] and 
Humphreys [25]. Niederle [38] gives a survey of the applications of the exceptional 
Lie algebras and Lie superalgebras in particle physics. Groups generated by reflections 
are treated by Humphreys [26], Bourbaki [5] and Kac [30], while Cohen [8] gives a 
useful overview. 

The character theory of locally compact abelian groups, whose roots lie in 
Dirichlet’s theorem on primes in arithmetic progressions, has given something back to 
number theory in the adelic approach to algebraic number fields; see the thesis of Tate, 
reproduced (pp. 305-347) in Cassels and Frohlich [6], Lang [31] and Weil [47]. For a 
broad historical perspective and future plans, see Mackey [35] and Langlands [32]. 
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XI 


Uniform Distribution and Ergodic Theory 


A trajectory of a system which is evolving with time may be said to be ‘recurrent’ if it 
keeps returning to any neighbourhood, however small, of its initial point, and ‘dense’ 
if it passes arbitrarily near to every point. It may be said to be ‘uniformly distributed’ 
if the proportion of time it spends in any region tends asymptotically to the ratio of the 
volume of that region to the volume of the whole space. In the present chapter these 
notions will be made precise and some fundamental properties derived. The subject of 
dynamical systems has its roots in mechanics, but we will be particularly concerned 
with its applications in number theory. 


1 Uniform Distribution 


Before introducing our subject, we establish the following interesting result: 


Lemma 0 Let J = [a,b] be a compact interval and f, : J > Ra sequence of non- 
decreasing functions. If fy,(t) > f(t) for every t € J asn > oo, where f: J OR 
is a continuous function, then f,(t) > f(t) uniformly on J. 


Proof Evidently f is also nondecreasing. Furthermore, since J is compact, f is 
uniformly continuous on J. It follows that, for any ¢ > 0, there is a subdivision 
a=to <t) <+:: < tm =b such that 


f (i) — f(e-1) < e K=1,...,m). 
We can choose a positive integer p so that, for alln > p, 

lfn(te) — f(a)| < € (kK =0,1,...,m). 
Ift € J, thent € [t,-1, | for some k € {1,..., m}. Hence 

frt) — FO) < fn) — FG) + FG) — F(&-1) < 2¢ 
and similarly 
fn(t) — f) = fn(te—-1) — Fa-1) + FG-1) — FG) > —2e. 

Thus | f(t) — f(t)| < 2e for every t € J ifn > p. 
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For any real number €, let |¢| denote again the greatest integer < € and let 
{oc} =¢- Le] 


denote the fractional part of ¢. We are going to prove that, if ¢ is irrational, then the 
sequence ({n¢}) of the fractional parts of the multiples of ¢ is dense in the unit interval 
I = (0, 1], i.e. every point of J is a limit point of the sequence. 

It is sufficient to show that the points z, = e?7'"*(n = 1,2,...) are dense on the 
unit circle. Since ¢ is irrational, the points z, are all distinct and z, # +1. Conse- 
quently they have a limit point on the unit circle. Thus, for any given ¢ > 0, there exist 
positive integers m, r such that 


|Zm-+r —Zml| <e. 
But 
lZmtr — Zm| = lZr — 1) = lzntr — Zn| for everyn €N. 


If we write z, = e279, where 0 < @ < 1, then zy, = e27'*? (k = 1,2,...). Define the 
positive integer N by 1/(N + 1) < @ < 1/N. Then the points z,, z2;,..., zr follow 
one another in order on the unit circle and every point of the unit circle is distant less 
than ¢ from one of these points. 

It may be asked if the sequence ({né}) is not only dense in J, but also spends 
‘the right amount of time’ in each subinterval of 7. To make the question precise we 
introduce the following definition: 

A sequence (€,) of real numbers is said to be uniformly distributed mod | if, for 
alla, f withO<a<f <1, 


9a,p(N)/N > B-a asN>oo, 


where ,(N) is the number of positive integers n < N such that a < {¢,} < f. 
In this definition we need only require that go,,(N)/N — a for every a € (0, 1), 
since 


Pa,p(N) = 90,8(N) — 90,a(N) 


and hence 


lpa,p(N)/N — (6 — a@)| < |90,8(N)/N — Bl + |0,0(N)/N — a. 


It follows from Lemma 0, with fn(t) = g0,1(n)/n and f(t) = f, that the sequence (¢,,) 
is uniformly distributed mod | if and only if 


9a,p(N)/N > B-a aN>oo 


uniformly for alla, 6 withO<a<f <1. 

It was first shown by Bohl (1909) that, if € is irrational, the sequence (né) is uni- 
formly distributed mod | in the sense of our definition. Later Wey] (1914, 1916) estab- 
lished this result by a less elementary, but much more general argument, which was 
equally applicable to multi-dimensional problems. The following two theorems, due to 
Weyl, replace the problem of showing that a sequence is uniformly distributed mod | 
by a more tractable analytic problem. 
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Theorem 1 A real sequence (€,) is uniformly distributed mod | if and only if, for 
every function f : I > C which is Riemann integrable, 


N 
n> Hed > | fOat as N - oo. (1) 


n=1 
Proof For any a, f € I witha < £, let x, denote the indicator function of the 
interval [a, £), ie. 
Xaplt)=1 fora <t<fZ, 


=0 _ otherwise. 


Since 
| xaslat =p a, 
I 


the definition of uniform distribution can be rephrased by saying that the sequence (¢,,) 
is uniformly distributed mod 1 if and only if, for all choices of a and /, 


N 
NT! DY 10.4En}) > [x0 dt asN > oo. 


n=1 


Thus the sequence (€,) is certainly uniformly distributed mod 1 if (1) holds for every 
Riemann integrable function /. 

Suppose now that the sequence (¢,,) is uniformly distributed mod 1. Then (1) holds 
not only for every function f = yq,s, but also for every finite linear combination 
of such functions, i.e. for every step-function f. But, for any real-valued Riemann 
integrable function f and any ¢ > 0, there exist step-functions f1, fo such that 


fi@) < f® < fp@ foreveryte/ 


and 


} (ft) — fA@)dt <e. 


Hence 


N N 
NTS) FUGY - | fOdt<NT> p&)) - z. pWdt+e 
I wat I 


n=1 


<2e forall large N, 


and similarly 


N 
NT> dG) - | f(t)dt > —2e forall large N. 
I 


n=1 


Thus (1) holds when the Riemann integrable function f is real-valued and also, by 
linearity, when it is complex-valued. 
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A converse of Theorem | has been proved by de Bruijn and Post (1968): if a func- 
tion f : J — Chas the property that 


N 
: -1 
slim N pa f (En) 
exists for every sequence (€,) which is uniformly distributed mod 1, then f is Riemann 
integrable. 
In the statement of the next result, and throughout the rest of the chapter, we use 
the abbreviation 


e(t) = ert! 


In the proof of the next result we use the Weierstrass approximation theorem: any con- 
tinuous function f : J > C of period 1 is the uniform limit of a sequence (f,) of 
trigonometric polynomials. In fact, as Fejér (1904) showed, one can take f;, to be the 
arithmetic mean (So + --- + S,—1)/n, where 


m 


Sin = Sn (8) = > cne(hx) 


h=—m 


is the m-th partial sum of the Fourier series for f. This yields the explicit formula 


als) = | Kale) Fat, 
where 
K,(u) = (sin? nx u)/(n sin? zu). 
Theorem 2 A real sequence (€,) is uniformly distributed mod | if and only if, for 
every integerh #0, 
N 


NS e(h&) > 0 asN > o. (2) 


n=1 


Proof If the sequence (€,) is uniformly distributed mod 1 then, by taking f(t) = 
e(ht) in Theorem | we obtain (2) since, for every integer h 4 0, 


[ew dt =0. 
I 


Conversely, suppose (2) holds for every nonzero integer h. Then, by linearity, for 
any trigonometric polynomial 


g(t)= >° bnelht) 


h=-—m 


we have 


N 
NY 8G) > bo = | einar ae 


n=1 1 
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If f is a continuous function of period | then, by the Weierstrass approximation 
theorem, for any ¢ > O there exists a trigonometric polynomial g(t) such that 
| f@) — g(t)| < e for every t € J. Hence 


N 
Ww Yeap - [roa 


n=1 


S 


N N 
NT >a) - e((eaD) + y > 8é}) - | g(t) dt 


I 


n=l n=1 


+ | y (e(t) — fp) at 


N 

<2e+ Ww Yecen— [ aoa 
n=1 

<3e forall large N. 


Thus (1) holds for every continuous function f of period 1. 
Finally, if ya,g is the function defined in the proof of Theorem | then, for any 
é > O, there exist continuous functions f|, f2 of period 1 such that 


fi) < xa,p(t) < fot) foreveryt e 1 


and 


| (AW — fi@)at <e, 


from which it follows similarly that 


N 
nt DY x0.4En}) > [0 dt asN—> oo. 


n=1 


Thus the sequence (¢,,) is uniformly distributed mod 1. 


Weyl’s criterion, as Theorem 2 is usually called, immediately implies Bohl’s result: 
Proposition 3 /f € is irrational, the sequence (né) is uniformly distributed mod 1. 
Proof For any nonzero integer h, 

e(hé) + e(2hé) +++» + e(NhE) = (e((N + I)hE) — e(hE))/(e(hé) — 1). 


Hence 


N 
y Di elhng)} < 2le(he) — 1TTN™, 


n=1 


and the result follows from Theorem 2. 
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These results can be immediately extended to higher dimensions. A sequence 
(xn) of vectors in R¢@ is said to be uniformly distributed mod | if, for all vectors 
a= (qaj,...,aq) andb = (f1,..., Ba) withO < ag < fe <1 (k=1,...,d), 


d 
@a,p(N)/N > [| —arp) aN>o, 
k=1 
where x, = (EO, dag Ee) and @q,5(N) is the number of positive integersn < N 


such that a, < fe) < x for every k € {1,...,d}. Let I be the set of all 
x = (EM, ...,€M) such thatO < €® < 1 (k = 1,...,d) and, for an arbitrary 
vector x = (en). nas E@)), put 


{x} = (E}, ..., {EMp. 


Then Theorems | and 2 have the following generalizations: 


Theorem 1’ A sequence (xn) of vectors in R¢ is uniformly distributed mod | if and 
only if, for every function f : I4 — C which is Riemann integrable, 


N 
wt > pad > fo fe. .stadan da as N > oo. 
I I 


n=1 


Theorem 2’ A sequence (xn) of vectors in R4 is uniformly distributed mod | if and 
only if, for every nonzero vector m = (t1,..., Ha) € Z4, 


N 
NT! diem ‘Xn) 70 asNO O, 


n=1 


where m- Xp) = mig? feet pat. 
Proposition 3 can also be generalized in the following way: 


Proposition 3 If x = (€",...,€) is any vector in R@ such that 1,é,...,€ 
are linearly independent over the field Q of rational numbers, then the sequence (nx) 
is uniformly distributed mod |. 


In particular, the sequence ({nx}) = ({né},...,{néM}) is dense in the 
d-dimensional unit cube if 1, aon Lane @) are linearly independent over the field Q 
of rational numbers. This much weaker assertion had already been proved before Wey] 
by Kronecker (1884). 

It is easily seen that the linear independence of 1, ¢ ch mae (@) over the field Qof 
rational numbers is also necessary for the sequence ({nx}) to be dense in the 
d-dimensional unit cube and, a fortiori, for the sequence (nx) to be uniformly dis- 
tributed mod 1. For if 1,é,...,é are linearly dependent over Q there exists a 
nonzero vector m = (11,..., dd) € Z4 such that 


m-x = Wye +--- 4+ wgé™ EZ. 
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It follows that each point of the sequence (nx) lies on some hyperplane m - y = h, 
where h € Z. Without loss of generality, suppose ~; # 0. Then no point of the 
d-dimensional unit cube which is sufficiently close to the point ern mee 0,...,0) 
lies on such a hyperplane. 

We now return to the one-dimensional case. Weyl used Theorem 2 to prove, not 
only Proposition 3, but also a deeper result concerning the uniform distribution of 
the sequence (f(n)), where f is a polynomial of any positive degree. We will derive 
Weyl’s result by a more general argument due to van der Corput (1931), based on the 
following inequality: 


Lemma 4 /f (,...,¢n are arbitrary complex numbers then, for any positive integer 
M<N, 
N 2 N M-1 N-m 
MS) fn] < M(M +N = 1) >on? +2(M + N= 1) >) (M =m) Tiong: 
n=1 n=1 m=1 n=1 


Proof Put G) = Oifn < Oorn > N. Then it is easily verified that 


N M+N-1 ,M-1 
wya= DY (La), 
n=1 h=1 k=0 
Applying Schwarz’s inequality (Chapter I, §4), we get 
N 2 M+N-1,M-1 2 
M7) > em) <(M@4+N-D) > | > Ge 
n=1 h=1 | k=0 
M+N-1 M-1 


=(M+N-l) >) DD Geta. 


h=1 j,k=0 


On the right side any term |¢n |? occurs exactly M times, namely forh—k = h—j =n. 
A term (nCn+m OF CnCnt+m, Where m > 0, occurs only if m < M and then it occurs 
exactly M — m times. Thus the right side is equal to 


N M-1 N-m 
M(M+N-1) S0lgnlP?+(M+N—1) >) Mm) SD) Grtntm + CnSntm)- 


n=1 m=1 n=1 


The lemma follows. 


Corollary 5 /f (€,) is a real sequence such that, for each positive integer m, 


N 
NT > eGrtm—&2) 30 as N > 00, 


n=1 


then 


N 
NS eG) 30 asN > ow. 


n=1 
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Proof By taking ¢; = e(€,) in Lemma 4 we obtain, for 1 < M < N, 


N 2 M-1 N-m 
N~) >" eG] < 2M +N -1)M7N? DS) (M—m)| D* erm — &) 
n=1 m=1 n=1 


+(M+N—1)M7'!N7!. 
Keeping M fixed and letting N — oo, we get 


N 


b> e(cn) 


n=1 


2 


lim N~? <M}. 
N->0oo 


But M can be chosen as large as we please. 


An immediate consequence is van der Corput’s difference theorem: 


Proposition 6 The real sequence (€,) is uniformly distributed mod | if, for each pos- 
itive integer m, the sequence (€n4m — En) is uniformly distributed mod 1. 


Proof If the sequences (€)4m — €,) are uniformly distributed mod 1 then, by 
Theorem 2, 


N 
nN! DY ehGn4m — én)) >0 asN> oo 


n=1 


for all integers h 4 0,m > 0. Replacing ¢, by Ag, in Corollary 5 we obtain, for all 
integers h 0, 


N 
N' > eign) > 0 as N > ov. 


n=1 


Hence, by Theorem 2 again, the sequence (¢,,) is uniformly distributed mod 1. 


The sequence (né), with € irrational, shows that we cannot replace ‘if’ by ‘if and 
only if’ in the statement of Proposition 6. Weyl’s result will now be derived from 
Proposition 6: 


Proposition 7 [f 
f(t) =a,t" +a,-\t" | +--+» 4+a9 


is any polynomial with real coefficients ax such that a, is irrational for at least one 
k > 0, then the sequence (f (n)) is uniformly distributed mod 1. 


Proof Ifr = 1, then the result holds by the same argument as in Proposition 3. We 
assume thatr > 1, a, ¢ 0 and the result holds for polynomials of degree less than r. 
For any positive integer m, 


amt) = f(t +m)— fo 
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is a polynomial of degree r — 1 with leading coefficient rma,. If a, is irrational, then 
rmay is also irrational and hence, by the induction hypothesis, the sequence (gm (n)) 
is uniformly distributed mod 1. Consequently, by Proposition 6, the sequence (f (n)) 
is also uniformly distributed mod 1. 

Suppose next that the leading coefficient a, is rational, and let as(1 < s <r) be 
the coefficient nearest to it which is irrational. Then the coefficients of t’—!,..., 1° of 
the polynomial g,,(t) are rational, but the coefficient of t°~! is irrational. If s > 1, 
it follows again from the induction hypothesis and Proposition 6 that the sequence 
(f (n)) is uniformly distributed mod 1. 

Suppose finally that s = 1 and put 


1 


F@) =a;t" +o,10" | +--+ eet’, 


If g > 0 is acommon denominator for the rational numbers a2,..., a, then, for any 
integer h ~ 0 and any nonnegative integers /, k, 


e(hF(jq +k)) = e(hF(k)). 


Write N = €q +k, where € = |N/q| and0O < k < q. Since f(t) = F(t) + ait + a0, 
we obtain 


N-1 g-lé-1 N 
NTS ehf@)) =N'T DD SehfGatbH) +N DY) ehaf@) 
n=0 k=0 j=0 n=lq 
q-l €-1 
= NTIN/q) Dl ehF(R) >) &le(hiqai + kai + a0)) 
k=0 j=0 


N 
eee ae > e(hf(n)). 


n=Ctq 


The last term tends to zero as N — ox, since the sum contains at most g terms, each 
of absolute value |. By Theorem 2, each of the g inner sums in the first term also tends 
to zero as N — ov, because the result holds for r = 1. Hence, by Theorem 2 again, 
the sequence (f ()) is uniformly distributed mod 1. 


An interesting extension of Proposition 6 was derived by Korobov and Postnikov 
(1952): 


Proposition 8 [f, for every positive integer m, the sequence (En4m — €n) is uniformly 
distributed mod | then, for all integers q > O andr > 0, the sequence (€gn+r) is 
uniformly distributed mod |. 


Proof We may suppose g > 1, since the assertion follows at once from Proposition 6 
if g = 1. By Theorem 2 it is enough to show that, for every integer m 4 0, 


N 
S:=N! DY elmEqnsr) >0 aNOow. 


n=1 
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Since 


q 
qa: > e(nk/q) =1 ifn=O0Omodgq, 


k=1 
=0 ifn #¢0modg, 
we can write 
qN q 
S= (qn) >) e(ménsr) >. enk/q) 
n=1 k=1 
q 4qN 


=(@N)' > diem), 


k=1 n=1 


where we have put 
0? = Engr +nk/mq. 
By hypothesis, for every positive integer h, the sequence 


k 
eee = A) = Cntntr — Sntr — hk/mgq 


is uniformly distributed mod 1. Hence 1 is uniformly distributed mod 1, by Propo- 


sition 6. Thus, foreachk e€ {1,...,q}, 


qN 
(qN)7! Di etm) 70 aNoo, 


n=1 


and consequently also S > Oas N > oo. 


As an application of Proposition 8 we prove 


Proposition 9 Let A be ad x d matrix of integers, no eigenvalue of which is a root 
of unity. If, for some x € R@, the sequence (A"x) is uniformly distributed mod | then, 
for any integers q > Oandr > 0, the sequence (A%"*" x) is also uniformly distributed 
mod 1. 


Proof It follows from Theorem 2’ that, for any nonzero vector m € Ze the scalar 
sequence ¢, = m- A” x is uniformly distributed mod 1. For any positive integer h, the 
sequence 


Cnth aa én =m: (A” = I)A"x = (Al _— Dim -A"x 


has the same form as the sequence é,, since the hypotheses ensure that (A — 1)'m is 
a nonzero vector in Z“. Hence the sequence ¢,47 — ¢y is uniformly distributed mod 1. 
Therefore, by Proposition 8, the sequence &jn4, = m-A%"*’x is uniformly distributed 
mod 1, and thus the sequence A?”*" x is uniformly distributed mod 1. 
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It may be noted that the matrix A in Proposition 9 is necessarily non-singular. 
For if det A = O, there exists a nonzero vector z € Z4 such that A'z = 0. Then, 
for any x € R¢@ and any positive integer n, e(z - A"x) = e((A‘)"z- x) = 1. Thus 
N7! ae e(z- A"x) = 1 and therefore, by Theorem 2’, the sequence A”x is not 
uniformly distributed mod 1. 

Further examples of uniformly distributed sequences are provided by the following 
result, which is due to Fejér (c. 1924): 


Proposition 10 Let (€,) be a sequence of real numbers such that nn := €n41 — én 
tends to zero monotonically as n — oo. Then (&,) is uniformly distributed mod | if 


n|4n| > casn> ~. 


Proof By changing the signs of all ¢, we may restrict attention to the case where the 
sequence (7,) is strictly decreasing. For any real numbers a, 6 we have 


le(a) — e(B) — 277i (a — B)e(B)| = le(a — B) —1—2zi(a — f)| 
a—B 
: (a — f —t)e(t)dt 
0 
a—B 
i (a —B—t)dt 
0 


= 21*(a — f)’. 


=4n? 


< 4x? 


If we take a = hé,+41 and 6 = hé,, where h is any nonzero integer, this yields 


le(hEn41)/tn — e(hEn)/Nn — 2aihe(hEn)| < 207h? nn 


and hence 


le(hEn4.1)/Mn-1 — e(hEn)/Mn — 2wihe(hEr)| < V/an41 — 1/ nn + 227h? Mn. 


Taking n = 1,..., N and adding, we obtain 


N N 
< 1/nwsi + 1/m + > 0/tnt1 — 1/1) + 207H? Stn 


n=1 n=1 
N 
= 2/nn41 + 207h? > tn. 


n=1 


N 
2h > e(hgn) 


n=1 


Thus 


N 


Dethén) 


n=1 


N 
< lh|Nonsi) | + 21hINT' SO tn. 


n=1 


N7! 


But the right side of this inequality tends to zero as N — ov, since Nyy — oo and 
1N 0. 
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By the mean value theorem, the hypotheses of Proposition 10 are certainly satisfied 
if & = f(n), where f is a differentiable function such that f’(t) — 0 monotonically 
as t > oo and t|f'(t)| > co as t — oo. Consequently the sequence (an“) is uni- 
formly distributed mod 1 if a 4 0 and 0 < a < 1, and the sequence (a(logn)*) is 
uniformly distributed mod | ifa # 0 anda > 1. By using van der Corput’s difference 
theorem and an inductive argument starting from Proposition 10, it may be further 
shown that the sequence (an“) is uniformly distributed mod | for any a 4 0 and any 
a > 0 which is not an integer. 

It has been shown by Kemperman (1973) that ‘if’ may be replaced by ‘if and only 
if’ in the statement of Proposition 10. Consequently the sequence (a(logn)%) is not 
uniformly distributed mod 1 if0 <a < 1. 

The theory of uniform distribution has an application, and its origin, in astronomy. 
In his investigations on the secular perturbations of planetary orbits Lagrange (1782) 
was led to the problem of mean motion: if 


z(t) = SD pre(wxt +k), 


k=1 


where py > 0 and ax, a € R (k = 1,...,n), does t~! arg z(t) have a finite limit 
as t + +00? It is assumed that z(t) never vanishes and arg z(t) is then defined by 
continuity. (Zeros of z(t) can be admitted by writing z(t) = p(t)e(P(t)), where p(t) 
and ¢(t) are continuous real-valued functions and p(t) is required to change sign at a 
zero of z(t) of odd multiplicity.) 

In the astronomical application arg z(t) measures the longitude of the perihelion of 
the planetary orbit. Lagrange showed that the limit 


arg c(t) 


w= lim, 
does exist when n = 2 and also, for arbitrary n, when some p; exceeds the sum of all 
the others. The only planets which do not satisfy this second condition are Venus and 
Earth. Lagrange went on to say that, when neither of the two conditions was satisfied, 
the problem was “very difficult and perhaps impossible”. 

There was no further progress until the work of Bohl (1909), who took n = 3 
and considered the non-Lagrangian case when there exists a triangle with sidelengths 
P1; P2, 3. He showed that the limit w exists if 1, @2, @3 are linearly independent 
over the rational field Q and then w = 4,@, + J2@2 + 4303, where 721, 712, 73 
are the angles of the triangle with sidelengths 1, p2, 93. In the course of the proof he 
stated and proved Proposition 3 (without formulating the general concept of uniform 
distribution). 

Using his earlier results on uniform distribution, Wey] (1938) showed that the limit 
“exists if @,..., @p» are linearly independent over the rational field Q and then 


Lu =A),@, +--+: +An@n, 


where Ay, > 0 (k = 1,...,2) and >°7_, Ax = 1. The coefficients A, depend only on 
the p’s, not on the a’s or w’s, and there is even an explicit expression for 2;, involving 
Bessel functions, which is derived from the theory of random walks. 

Finally, it was shown by Jessen and Tornehave (1945) that the limit w exists for 
arbitrary mp € R. 
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2 Discrepancy 


The star discrepancy of a finite set of points ¢1,...,¢y in the unit interval J = [0, 1] 
is defined to be 


Dy = DyGis....Ev) = sup Ipa(N)/N — al, 


S 
O0<a<l 


where 9,(N) = @0,a(N) denotes the number of positive integers n < N such that 
0 < &, <a. Here we will omit the qualifier ‘star’, since we will not be concerned with 
any other type of discrepancy and the notation Dj, should provide adequate warning. 

It was discovered only in 1972, by Niederreiter, that the preceding definition may 
be reformulated in the following simple way: 


Proposition 11 [f¢,,...,éy are real numbers such thatO < & < +--+: < én < 1, then 


Dy = Dy(4i,.--.¢n) = pmax max(ce — &/ NI, lee — & — 1)/N1) 


<k 


= (2N)"!+ max |& — (2k —1)/2N}. 
1<k<N 


Proof Put é& = 0,én+1 = 1. Since the distinct €& withO < k < N+ 1 define a 
subdivision of the unit interval 7, we have 


Dy = max sup. |@a(N)/N —a| 


Risk <Sk+l &e<a<ee41 


= max sup |k/N—al. 


Kick <Sk+l Esa <1 


But the function f,(t) = |k/N — t| attains its maximum in the interval & < t < 4+ 
at one of the endpoints of this interval. Consequently 


Dy =. max max(|k/N — &l, |k/N — &411). 


Kile <Ck4A 
We are going to show that in fact 


Dy = max max(|k/N — &|, |k/N — &+1)). 
O<k<N 


Suppose ce < ckt1 = Ck+2 = +°* = Cktr < Cktr41 for some r > 2. By 
applying the same reasoning as before to the function g,(t) = |t —&+1| we obtain, for 
l<j <r, 


(K+ P/N — Ceri] = 1&4 A)/N — Cea gil = 1K + )/N — etl 
< max(|k/N — Cesil, 1k +1)/N — ce+11) 
= max(|k/N — cx+il, (& +1)/N — Cr+). 


Since both terms in the last maximum appear in the expression already obtained for 
Dy, it follows that this expression is not altered by dropping the restriction to those k 
for which cj < €x41. 

Since |0/N — éo| = |N/N — €n41| = 0, we can now also write 
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Dy = pmax max(|k/N — &|, (kK — 1)/N — &l). 


The second expression for Dj, follows immediately, since 


max(|k/N — al, |(k — 1)/N — al) = |(k — 1/2)/N —a| + 1/2N. 


Corollary 12 If €|,...,éy are real numbers such thatO0 < & < +++ < €&y < 1, 
then Dy = (2N)~!. Moreover, equality holds if and only if & = (2k — 1)/N for 
k= Lynas NV. 


Thus Proposition 11 says that the discrepancy of any set of N points of J is 
obtained by adding to its minimal value 1/2. the maximum deviation of the set from 
the unique minimizing set, when both sets are arranged in order of magnitude. 


The next result shows that the discrepancy D5, (¢, ...,€y) is a continuous func- 
tion of €,...,én. 
Proposition 13 If €|,...,éy and m,...,4n are two sets of N points of I, with the 


discrepancies Dx, and Ex, respectively, then 
Dy, — EX| < max |& — nel. 
IDy — Enl Ss | nk| 


Proof Let xj < +--+ < xy and yy < --- < yy be the two given sets rearranged in 
order of magnitude. It is enough to show that 


max |xp — <6:= max — nl, 
Pe k— Yel < (em lek nk| 
since it then follows from Proposition 11 that 
Dy <O+Ey, Ey 56+ Dy. 


Assume, on the contrary, that |xz — yx| > 6 for some k. Then either x, > yy +06 
or yx > xx +6. Without loss of generality we restrict attention to the first case. By 
hypothesis, for each y; with 1 <i < k there exists an x;, with 1 < j; < N such that 
|yi — xj;,| < 6 and such that the subscripts j; are distinct. Since yj < --- < yx, it 
follows that 


Xj, S VFO S Ve +O < Xp. 


But this is a contradiction, since there are at most k — 1 x’s less than xx. 


We now show how the notion of discrepancy makes it possible to obtain estimates 
for the accuracy of various methods of numerical integration. 


Proposition 14 [f the function f satisfies the ‘Lipschitz condition’ 
If) -f@)l < Llp -—n| forall, €1, 


then for any finite set €,,...,¢n € I with discrepancy Dy, 


N 
Ye [roa < LDy. 
n=1 I 
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Proof Without loss of generality we may assume ¢) < --- < ¢y. Writing 
N n/N 
| f@d=>> | f(tdt, 
I pa Lal) /N 
we obtain 


N_ pn/N 
< y (En ae (t dt 
pay y- fol 


Vi 
n=1 


N_ pn/N 
<Ly fie —alar, 


(n—1)/N 


N 
wo Vien f roar 


n=1 


n=1 


But for (n — 1)/N < t < n/N we have 


Ion — t] < max(|en —1n/NI, len — (2 — 1)/N) < Dy, 


by Proposition 11. The result follows. 


As Koksma (1942) first showed, Proposition 14 can be sharpened in the following 
way: 


Proposition 15 If the function f has bounded variation on the unit interval I, with 
total variation V, then for any finite set ¢\,...,¢n € I with discrepancy Dy, 


N 
Ww Yrey- [roar < VD%. 


n=1 


Proof Without loss of generality we may assume ¢] < --- < ¢y and we put ¢& = 0, 
€n+1 = 1. By integration and summation by parts we obtain 


N Cn+1 N 
> fe -nnare = [rap = No Dnt Gv) ~ FE) 
n=0 


Sn n=0 
N-1 
=f I, - i fat — fA) +N" Se Ga) 
n=0 


N 
=N"> fG)- | fdr. 


n=1 


The result follows, since for ¢, < t < ¢,41 we have 


|t —n/N| < max(|é, — n/N], |on41 —2/NI) < Dy. 
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As an application of Proposition 15 we prove 


Proposition 16 [f ¢|,...,¢y are points of the unit interval I with discrepancy Dy 
then, for any integer h # 0, 
N 
y Do ehgn)| < AAI Dy. 
n=1 


Proof We can write 


N 
ND elhén) = pe(a), 


n=1 


where p > O anda e€ J. Thus 


N 
p= NT! Vi ethén —@). 


n=1 


Adding this relation to its complex conjugate, we obtain 


N 
p= NT! >i cos 2 (hé, — a). 


n=1 


The result follows by applying Proposition 15 to the function f(t) = cos2a (ht — a), 
which has bounded variation on J with total variation J, lf (t)| dt = 4|h|. 


An inequality in the opposite direction to Proposition 16 was obtained by Erdés 
and Turan (1948) who showed that, for any positive integer m, 


where the positive constant C is independent of m, N and the ¢’s. Niederreiter and 
Philipp (1973) showed that one can take C = 4. Furthermore they generalized the 
result and simplified the proof. 

The connection between these results and the theory of uniform distribution is 
close at hand. Let (€,) be an arbitrary sequence of real numbers and let dy denote the 
discrepancy of the fractional parts {€|},..., {€}. By the remark after the definition of 
uniform distribution in 81, the sequence (€,) is uniformly distributed mod | if and only 
if dn — Oas N > ow. It follows from Proposition 16 and the inequality of Erdés and 
Turan (in which m may be arbitrarily large) that dy — 0 as N — ov if and only if, 
for every integer h # 0, 


m 


Dy < C(m-! + eS 


h=1 


N 
ND ethén) 
n=1 


N 
No! Dd eh&n) >0 asN—-> oo. 


n=1 


This provides a new proof of Theorem 2. Furthermore, from bounds for the exponential 
sums we can obtain estimates for the rapidity with which dy tends to zero. 
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Propositions 14 and 15 show that in a formula for numerical integration the nodes 
(€,) should be chosen to have as small a discrepancy as possible. For a given finite 
number NV of nodes Corollary 12 shows how this can be achieved. In practice, how- 
ever, one does not know in advance an appropriate choice of N, since universal error 
bounds may grossly overestimate the error in a specific case. Consequently it is also of 
interest to consider the problem of choosing an infinite sequence (¢,,) of nodes so that 
the discrepancy dy of ¢1,...,€y tends to zero as rapidly as possible when N — oo. 
There is a limit to what can be achieved in this way. W. Schmidt (1972), improving 
earlier results of van Aardenne-Ehrenfest (1949) and Roth (1954), showed that there 
exists an absolute constant C > 0 such that 


lim Nody/logN > C 
Noo 


for every infinite sequence (€,). Kuipers and Niederreiter (1974) showed that a pos- 
sible value for C was (132log2)~! = 0.0109..., which Bejian (1979) improved to 
(24log2)~! = 0.0601... 

Schmidt’s result is best possible, apart from the value of the constant. Ostrowski 
(1922) had already shown that for the sequence ({na}), where a € (0, 1) is irrational, 
one has 


s*(a):= lim Noéy/logN < oo 
Noo 
if in the continued fraction expansion 


a =[0; a), a2,...J= i 


ay ++ 


ayt+ 


the partial quotients a, are bounded. Dupain and Sés (1984) have shown that the mini- 
mum value of s*(a), for all such a, is (4log(1+/2))~! = 0.283... and the minimum 
is attained fora = /2—1= [0; 2,2, ...]. Schoessengeier (1984) has proved that, for 
any irrational a € (0, 1), one has Noy = O(log N) if and only if the partial quotients 
ay satisfy )°7_) ax = O(n). 

There are other low discrepancy sequences. Haber (1966) showed that, for a 
sequence (€,,) constructed by van der Corput (1935), 


lim Nody/logN = Glog2)~! = 0.481.... 
N> oo 


van der Corput’s sequence is defined as follows: if mn — 1 = ay)2™ +--+ + a2! +. ao, 
where ax € {0,1}, then & = ag27! + aj27? +--+ + ay27"7!. In other words, 
the expression for ¢, in the base 2 is obtained from that for n — 1 by reflection 
in the ‘decimal’ point, a construction which is easily implemented on a computer. 
Various generalizations of this construction have been given, and Faure (1981) defined 
in this way a sequence (¢,,) for which 


lim Noy /log N = (1919)(3454 log 12)! = 0.223... 
N->0oo 
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Thus if C* is the least upper bound for all admissible values of C in Schmidt’s 
result then, by what has been said, 0.060... < C* < 0.223.... It is natural to ask: 
what is the exact value of C*, and is there a sequence (€,) for which it is attained? 

The notion of discrepancy is easily extended to higher dimensions by defining 
the discrepancy of a finite set of vectors x;,...,xy in the d-dimensional unit cube 
I4=1x---x I tobe 


DiGi 321590) = sup \@a(N)/N — a1 +++ Gal, 
O<az<1 (k=1,...,d) 
where x, = (EO, aucns EM). a = (a1,...,@q) and gq(N) is the number of positive 
integers n < N such that 0 < Ee) < ox for every k € {1,...,d}. 


For d > | there is no simple reformulation of the definition analogous to Proposi- 
tion 11, but many results do carry over. In particular, Proposition 15 was generalized 
and applied to the numerical evaluation of multiple integrals by Hlawka (1961/62). 
Indeed this application has greater value in higher dimensions, where other methods 
perform poorly. 

For the application one requires a set of vectors x;,...,xy € I¢ whose discrep- 
ancy Dy (x1,...,xN) is small. A simple procedure for obtaining such a set, which is 
most useful when the integrand is smooth and has period | in each of its variables, is 
the method of ‘good lattice points’ introduced by Korobov (1959). Here, for a suitably 
chosen g € Z?, one takes x, = {(n—1)g/N} (n = 1,..., N). A result of Niederreiter 
(1986) implies that, for every d > 2 and every N > 2, one can choose g so that 


ND* < (1+ log N)* + d2¢, 


The van der Corput sequence has also been generalized to any finite number of 
dimensions by Halton (1960). He defined an infinite sequence (x,,) of vectors in R¢ for 
which 


lim Non /(log N)4 < OO. 
N->co 


It is conjectured that for each d > | (as for d = 1) there exists an absolute constant 
Cq > 0 such that 


lim Noy /(log N)! > Cg 
N-> oo 


for every infinite sequence (x,) of vectors in R¢. However, the best known result 
remains that of Roth (1954), in which the exponent d is replaced by d/2. 


3 Birkhoff’s Ergodic Theorem 


In statistical mechanics there is a procedure for calculating the physical properties of 
a system by simply averaging over all possible states of the system. To justify this 
procedure Boltzmann (1871) introduced what he later called the ‘ergodic hypothesis’. 
In the formulation of Maxwell (1879) this says that “the system, if left to itself in its 
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actual state of motion, will, sooner or later, pass through every phase which is con- 
sistent with the equation of energy”. The word ergodic, coined by Boltzmann (1884), 
was a composite of the Greek words for ‘energy’ and ‘path’. It was recognized by 
Poincaré (1894) that it was too much to ask that a path pass through every state on the 
same energy surface as its initial state, and he suggested instead that it pass arbitrarily 
close to every such state. Moreover, he observed that it would still be necessary to 
exclude certain exceptional initial states. 

A breakthrough came with the work of G.D. Birkhoff (1931), who showed that 
Lebesgue measure was the appropriate tool for treating the problem. He established a 
deep and general result which says that, apart from a set of initial states of measure 
zero, there is a definite limiting value for the proportion of time which a path spends in 
any given measurable subset B of an energy surface X. The proper formulation for the 
ergodic hypothesis was then that this limiting value should coincide with the ratio of 
the measure of B to that of X, i.e. that ‘the paths through almost all initial states should 
be uniformly distributed over arbitrary measurable sets’. It was not difficult to deduce 
that this was the case if and only if ‘any invariant measurable subset of X either had 
measure zero or had the same measure as X’. 

Birkhoff proved his theorem in the framework of classical mechanics and for flows 
with continuous time. We will prove his theorem in the abstract setting of probability 
spaces and for cascades with discrete time. The abstract formulation makes possible 
other applications, for which continuous time is not appropriate. 

Let Z be a o-algebra of subsets of a given set X, i.e. anonempty family of subsets 
of X such that 


(B1) the complement of any set in # is again a set in Z, 
(B2) the union of any finite or countable collection of sets in Z is again a set in Z. 


It follows that X € Z, since B € Bimplies BS := X\B € Band X = BUB*. 
Hence also 9 = X° e€ &. Furthermore, the intersection of any finite or countable 
collection of sets in Z is again a set in &, since (),, Bn = X\(U, Bf). Hence if 
A,B e &, then 


B\A=BNASEB 
and the symmetric difference 
AAB := (B\A)U(A\B) € &. 


The family of all subsets of X is certainly a o-algebra. Furthermore, the intersec- 
tion of any collection of o-algebras is again a o-algebra. It follows that, for any family 
& of subsets of X, there is a o-algebra o(.) which contains & and is contained 
in every o-algebra which contains 7. We call a (.o/) the o-algebra of subsets of X 
generated by &. 

Suppose # is a a-algebra of subsets of X and a function u : Z > R is defined 
such that 


(Pr1) «(B) > 0 forevery B € &, 

(Pr2) u(X) = 1, 

(Pr3) if (By) is a sequence of pairwise disjoint sets in &, then H(U, Bn) = 
>> n L(Bn). 
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Then 4 is said to be a probability measure and the triple (X, Z, 1) is said to be a 
probability space. 
It is easily seen that the definition implies 


(i) «(@) =0, 
(ii) «(B°) = 1 — w(B), 
(iti) w(A) < w(B)if A, Be AandACB, 
(iv) (Bn) > “(B) if (By) is a sequence of sets in Z such that B} D By D--- and 
B=(),, Bn. 


If a property of points in a probability space (X, #, uw) holds for all x € B, where 
B € Band w(B) = 1, then the property is said to hold for (4-) almost all x € X, or 
simply almost everywhere (a.e.). 

A function f : X —- R is measurable if, for every a e€ R, the set 
{x € X : f(x) < a} isin &. Let f : X — [0, 00) be measurable and for any 
partition Y of X into finitely many pairwise disjoint sets Bj,..., By € Z, put 


Lo(f) =>) fe u(Be), 


k=1 


where fy = inf{ f(x) : x € By}. We say that f is integrable if 
| fan sup Lal) < ox. 
xX Pp 


The set of all measurable functions f : X — R such that | f| is integrable is denoted 
by L(X, Z, pn). 

A map T : X > X is said to be a measure-preserving transformation of the prob- 
ability space (X,Y, ) if, for every B € Z, the set T~'B = {x € X: Tx € Bis 
again in Z and u(T~!B) = y(B). This is equivalent to u(7B) = s(B) for every 
B € & if the measure-preserving transformation T is invertible, i.e. if T is bijective 
and TB e€ & for every B € &. However, we do not wish to restrict attention to the 
invertible case. Several important examples of measure-preserving transformations of 
probability spaces will be given in the next section. 

Birkhoff’s ergodic theorem, which is also known as the ‘individual’ or ‘pointwise’ 
ergodic theorem, has the following statement: 


Theorem 17 Let T be a measure-preserving transformation of the probability space 
(X, Z, uw). If f € L(X, Z, mw) then, for almost all x € X, the limit 


n—-1 
f*(@) = lim nD! f(T*x) 
k=0 


exists and f*(Tx) = f*(x). Moreover, f* € L(X, B, u) and fy f*du = fy f du. 


Proof It is sufficient to prove the theorem for nonnegative functions, since we can 
write f = fy — f-_, where 


f+ («) = max{f(x), 0},  f- (x) = max{— f(x), 0}, 
and f,, f- € L(X, Z, pu). 
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Put 
n—-1 n—-1 
Fe) = Tim nD) f*x), fF) = lim no" DI F(T). 
k=0 saa k=0 


Then f and f are -measurable functions since, for any sequence (7), 
lim 8n(x) = inf (sup gn(x)), lim gn(x) = sup (inf gn(x)). 
NCO m n>m n> oo m nem 


Moreover f(x) = f (Tx), f(x) = f(Tx) for every x € X, since 


n n—-1 
+1 frkx) = 4 If) +t Vaya! So pry). 


k=0 k=0 


It is sufficient to show that 


[fans [ paws f fan. 


For then, since f < f, it follows that f(x) = f(x) = f*(x) for p-almost all x € X 


and 
[ rraw= fe row. 


Fix some M > 0 and define the ‘cut-off’ function fy by 
fu (x) = min{M, f(x)}. 


Then fy is bounded and fu (Tx) = fu(x) for every x € X. Fix also any ¢ > 0. By 
the definition of f(x), for each x € X there exists a positive integer n such that 


n—-1 
fu(x) <n! >? f (Thx) +e. (*) 
k=0 
Thus if Ff, is the set of all x € X for which (*) holds and if E, = C= Fy, then 
E, C Ey C.--- and X = esi E,. Since the sets E, are -measurable, we can 
choose AN so large that u(Ew) > 1—e/M. 
Put 


f(x) = f(x) ifx € En, 
= max{f(x),M} ifx ¢ Ey. 
Also, let t (x) be the least positive integer n < N for which («) holds if x € Ey, and 


let t(x) = lif x ¢ Ey. Since fy is T-invariant, (*) implies 


n—1 n—1 
> fu(tkx) < D> phx) +ne 
k=0 


k=0 
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and hence 


t(x)-1 t(x)-1 
> full Ds > f (TEx) + r(x)e. 


k=0 k=0 


To estimate the sum +, fu (T*x) for any L > N, we partition it into blocks of 
the form 


t(y)-l 


> fury) 


k=0 
and a remainder block. More precisely, define inductively 
no(x) =0, ng(x) = ng_-p(x) + 2(T"™"'x) (kK = 1,2,...) 
and define h by ny (x) < L < np41(x). Then 


ny(x)—1 ni(x)—1 

> fu(r'x)s So frkx)+r@e, 
k=0 k=0 

n2(x)—-1 n2(x)—-1 

> fu(t'x)< So fartx)+ rT" xe, 


k=n (x) k=n, (x) 


ny(x)-1 ny(x)—1 
> fut'x)< SL feta + c(™' xe. 
k=nn-1(*) k=np—1 (x) 


Since np (x) < L, we obtain by addition 


np(x)—-1 npn (x)—-1 


>» fu (Tx) < ~~ f(T*x) + Le. 


k=0 k=0 


On the other hand, since L < np41(x) < np (x) + N, we have 


L-1 
>) fu(Tkx) < NM. 
k=np (x) 
Since f > 0, it follows that 
L-1 L-1 
> fu(T*x) < Do f(r'x) + Le + NM. 
k=0 k=0 


Dividing by L and integrating over X, we obtain 


| fins sf feu te+nm/L, 
X xX 
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since the measure-preserving nature of T implies that, for any g € L(X, FZ, “), 


J erveney =f etsy aute 
x xX 


Since 


[fas fitout [mons | faye, 
x x X\En x 


it follows that 
| fuous | fay + 20+ m/t. 
x x 


Since L may be chosen arbitrarily large and then « arbitrarily small, we conclude that 


[ fuaws [ ran. 


Now letting M — ov, we obtain 


[ fews f raw. 
[i faws [fan 


is similar. Given ¢ > 0, there exists for each x € X a positive integer n such that 


The proof that 


n—-1 


n DFTs) < f@) +e. (+#) 


k=0 


If F,, is the set of all x € X for which (**) holds and if E, = ie a Fy, we can choose 
N so large that 


| fdu <e. 
X\En 


Put 


f@) = f(x) ifx e Ey, 
=0 ifx ¢ Ey. 


Let t (x) be the least positive integer n for which (**) holds if x € Ey, and t(x) = 1 
otherwise. The proof now goes through in the same way as before. 
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It should be noticed that the preceding proof simplifies if the function f is bounded. 
In Birkhoff’s original formulation the function f was the indicator function yg of an 
arbitrary set B € &. In this case the theorem says that, if vy (x) is the number of k < n 
for which T*x © B, then limy_5 09 Un (x)/n exists for almost all x € X. That is, ‘almost 
every point has an average sojourn time in any measurable set’. 

A measure-preserving transformation 7 of the probability space (X, Z%, yz) is said 
to be ergodic if, for every B € # with T~'B = B, either u(B) = 0 or u(B) = 1. 
Part (ii) of the next proposition says that this is the case if and only if ‘time means and 
space means are equal’. 


Proposition 18 Let T be a measure-preserving transformation of the probability 
space (X, B, uw). Then T is ergodic if and only if one of the following equivalent 
properties holds: 


(i) if f € L(X, Z, nw) satisfies f (Tx) = f (x) almost everywhere, then f is constant 
almost everywhere; 
(ii) if f € L(X, &, w) then, for almost all x € X, 


n-1 


=] Tk d : 
n par a > [fay asn — oo 


k=0 
(iii) if A, B © Z, then 


nl 
n! Sura NB)> w(A)u(B) asn-> ~; 
k=0 


(iv) fC € Band p(C) > 0, then un(U,3,T "OC = Ls 
(v) ifA, B € Band uw(A) > 0, u(B) > 0, then u(T "AN B) > 0 for some n > 0. 


Proof Suppose first that T is ergodic and let f € L(X, Z, wu) satisfy f(Tx) = f(x) 
a.e. Put 
n=l 
(+) — Tm yl k 
F(x) = Tim nS) f(7*x). 
k=0 
Then f(Tx) = f(x) for every x € X and f(x) = f(x) ae. For any a € R, let 
Ag ={x EX: f(x) <a}. 


Then w(A,) = 0 or 1, since T~!Aq = Aq and T is ergodic. Since “(A,) is a nonde- 
creasing function of a and “(Aq) — 0 asa > —ov, “(Aqg) > 1 asa > +00, there 
exists 6 € R such that u(A,) = 0 fora < f and u(Aq) = 1 fora > f. It follows 
that w(Ag) = 0 and u (Bz) = 1, where 


Bg ={x eX: f(x) < f}. 


Hence f(x) = f a.e. and (1) holds. 
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Suppose now that (i) holds and let f € L(X, Z, uw). Then the function f* in the 
statement of Theorem 17 must be constant a.e. Moreover, if y is its constant value, we 


must have 
y= | rau=f rey. 
x X 
Thus (i) implies (11). 


Suppose next that (ii) holds and let A, B €¢ &. Then, for almost all x € X, 


n—-1 
“ =] k = = 
sim 2S) zat) = f xa du = wd). 


k=0 - 
Hence, for almost all x € X, 
n-1 
im no DxaPsdxa@) = H(A)xa(x) 


and so, by the dominated convergence theorem, 
n—-1 


; tim nS ya(Thx) x0 (0) duel) 


n—-> oo 
k=0 


u(A)u(B) 


n—-1 
tim nS f pal) x00) duce) 
k=0"* 


n> Co 
n—-1 
aa -1 -k 
= lim n 2 ANMB). 


Thus (ii) implies (111). 

Suppose now that (iii) holds and choose C € @ with w(C) > 0. Put A = 
Un>oF7C and B = (U,,5; 7~"C)°. Then, for every k > 1, T—“*A C Us, F-7C 
and hence «(T~* AN B) = 0. Thus 

n—-1 
n! Sura NB)=uwANB)/n>0 asn-> oo. 
k=0 
Since w(A) > u(C) > 0, it follows from (iii) that ~(B) = 0. Thus (iii) implies (iv). 

Next choose any A, B € # such that w(A) > 0, u(B) > O. If (iv) holds, then 

u(U,s1 77" A) = 1 and hence 


u(B) = u(8 n U ra) = u( Ue n ray), 


Since “(B) > 0, it follows that ~4(B 9 T~" A) > 0 for some n > 0. Thus (iv) implies 
(v). 

Finally choose A € ¥ with T~'A = A and put B = A°. Then, for every n > 1, 
we have w(T-" AM B) = u(AN B) = 0. If (v) holds, it follows that either w(A) = 0 
or (B) = 0. Hence (v) implies that T is ergodic. 
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4 Applications 


We now give some examples to illustrate the general concepts and results of the 
previous section. 


(i) Suppose X = R¢/Z¢ is a d-dimensional torus, F is the family of Borel subsets of 
X (ie., the c-algebra of subsets generated by the family of open sets), and uw = 4 is 
Lebesgue measure, i.e. w(B) = i: Xp (x) dx for any B € &, where yz is the indica- 
tor function of B. Every x € X is represented by a unique vector (€],..., a), where 
0<& <1(k=1,...,d), and X is an abelian group with addition z = x + y defined 
by & =& +m modl (k= 1,...,d). 

For any a € X, the translation T, : X — X defined by T,x = x + a is a measure- 
preserving transformation of the probability space (X, F, 2). 


Proposition 19 The translation Tg : X — X of the d-dimensional torus X = R4/Z4 


is ergodic if and only if 1,a,,..., Qa are linearly independent over the rational field 
Q where (a1, ..., Gq) is the vector which represents a. 
Proof Suppose first that 1, a1, ..., aq are not linearly independent over Q. Then there 


exists a nonzero vector n € Z@ such that 
n-a=vait+-:::+vgag € Z. 


Hence if f(x) = e(n- x), then f(T,x) = f(x) for all x. Since f is not constant a.e., 
it follows from part (i) of Proposition 18 that 7, is not ergodic. 

Suppose on the other hand that 1, a1, ..., aq are linearly independent over Q and 
let f be an integrable function such that f(Tgx) = f(x) ae. Then f(T,x) and f(x) 
have the same Fourier coefficients: 


| eee sere = a la -a) | f(x)e(=n - x) de. 
Xx Xx x 
Since e(n- a) # 1 for all n £ 0, it follows that 
[ feoecn-xex=0 foralln £0. 
XxX 


Since integrable functions with the same Fourier coefficients must agree almost 
everywhere, this proves that f is constant a.e. Hence, by Proposition 18 again, 7, 
is ergodic. 


If we compare Proposition 3’ and the remarks after its proof with Proposition 19, 
then we see from Theorems 1’-2’ and Proposition 18 that the following five statements 
are equivalent for X = R¢/Z4 and anya € X: 


(a) the sequence ({na}) is dense in X; 

(f) for every x € X, the sequence (x + na) is uniformly distributed in X; 

(y ) the translation T, : X — X is ergodic; 

(5) for each continuous function f : X > C, limp 5.0.7! pata FE) a fe fa 
for allx € X; 
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(e) for each function f € L(X, B, 2), limasoon@! ty f (TEx) = Jy fda for 
almost all x € X. 


(ii) Again suppose X = R“/Z4 is a d-dimensional torus, Z is the family of Borel 
subsets of X and « = A is Lebesgue measure. For any d x d matrix A = (a jx) of 
integers, let R4 : X > X be the map defined by R4x = x’, where 


d 
6 = >i arc modl(j =1,...,d). 
k=1 


If det A = 0 then Rg is not measure-preserving, since the image of R@ under A is con- 
tained in a hyperplane of R¢?. However, if det A 4 0 then R, is measure-preserving, 
since each point of X is the image under Ry, of | det A| distinct points of X, and a 
small region B of X is the image under Ry, of | det A| disjoint regions, each with vol- 
ume | det A|~! times that of B. (This argument is certainly valid if A is a diagonal 
matrix, and the general case may be reduced to this by Proposition III.41.) Thus R4 is 
an endomorphism of the torus R¢ /Z¢ if and only if A is nonsingular, and an automor- 
phism if and only if detA = +1. 


Proposition 20 The endomorphism Ra : X — X of the d-dimensional torus X = 
R¢/Z4 is ergodic if and only if no eigenvalue of the nonsingular matrix A is a root of 


unity. 
Proof For any n € Z4 we have 
e(n- Rax) = e(n- Ax) = e(Dn- x), 


where D = A‘ is the transpose of A. 

Suppose first that A, and hence also D, has an eigenvalue @ which is a root of unity: 
ow? = | for some positive integer p. Then (D? — I)z = 0 for some nonzero vector z. 
Moreover, since D is a matrix of integers, we may assume that z = m € Z4. We may 
further assume that D'm 4 D/m for 0 <i < j < p, by choosing p to have its least 
possible value. If we put 


f(x) =e(m-x) + e(m- Ax) +--- +e(m- A?~'x), 
then f(R4ax) = f(x), but f is not constant a.e. Hence Rg is not ergodic, by Proposi- 
tion 18. 
Suppose next that R, is not ergodic. Then, by Proposition 18 again, there exists a 


function f € L(X, &, 4) such that f(Rax) = f(x) ae., but f(x) is not constant a.e. 
If the Fourier series of f(x) is 
> cne(n: x), 


neZa 
then the Fourier series of f(R4x) is 


> Cn e(n+ Ax) = > Cn e(Dn + x) = ss Cp-ln e(n- X) 


neZA neZA neZd 
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and hence 


Cn =Cp-\, foreveryne Z4, 


But c, ~ 0 for some nonzero m € Z4, since f is not constant a.e., and |c,| — 0 
as |n| > ov, since f € L(X, &, 2). Since cp-«,, = Cm for every positive integer k, 
it follows that the subscripts D~*m are not all distinct. Hence D?m = m for some 
positive integer p and A has an eigenvalue which is a root of unity. 


(There are generalizations of Propositions 19 and 20 to translations and endomor- 
phisms of any compact abelian group X, with Haar measure in place of Lebesgue 
measure.) 

The preceding results have an application to the theory of ‘normal numbers’. In 
fact, without any extra effort, we will consider also higher-dimensional generaliza- 
tions. A vector x € R@ is said to be normal with respect to the matrix A, where A is a 
d x d matrix of integers, if the sequence (A”x) is uniformly distributed mod 1. 


Proposition 21 Let A be ad x d matrix of integers. Then (A-) almost all vectors 
x € R¢ are normal with respect to A if and only if A is nonsingular and no eigenvalue 
of A is a root of unity. 


Proof If A is nonsingular and no eigenvalue of A is a root of unity then, by Proposi- 
tion 20, R4 is an ergodic measure-preserving transformation of the torus X = R4¢/Z?. 
Hence, by Proposition 18(ii), for each nonzero m € ZA ; 


n-1 
n! Siem -A"x) > 0 asn— oo foralmostallx € R¢. 
k=0 


Since Z¢ is countable, and the union of a countable number of sets of measure zero is 
again a set of measure zero, it follows that, for almost all x € R¢, 


n—1 
n! > e(m- A"x) > 0 asn-— > oo for every nonzero m € Z/. 
k=0 


Hence, by Theorem 2’, almost all x € R@ are normal with respect to A. 

If A is singular then, by the remark following the proof of Proposition 9, no x € R@ 
is normal with respect to A. Suppose finally that some eigenvalue of A is a root of unity. 
Then there exists a positive integer p and a nonzero vector z € Z@ such that D?z = z, 
where D = A’. If 


f(x) =e(z-x) Fe(z- Ax) +--+ +e(z- AP!x), 
then f(Ax) = f(x) and hence 
n—1 
n-'>- f(A*x) = f(@). 
k=0 


But if x is normal with respect to A then, by Theorem 1’, 
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n—-1 
mS pat f fdi=0. 
k=0 * 


Since f is not zero a.e., it follows that the set of all x which are normal with respect 
to A does not have full measure. 


We consider next when normality with respect to one matrix coincides with 
normality with respect to another matrix. 


Proposition 22 Let A be ad x d nonsingular matrix of integers, no eigenvalue of 
which is a root of unity. Then, for any positive integer q, the vector x € R@ is normal 
with respect to AY if and only if it is normal with respect to A. 


Proof It follows at once from Proposition 9 that if x is normal with respect to A, then 
it is also normal with respect to A‘. 

Suppose, on the other hand, that x is normal with respect to AY. Then, by Theorem 
2’, for every nonzero vector m € Z4, 


N-1 
No! >) e(m- Ax) 30 as N > oo. 
n=0 


Put D = A’. Since D is a nonsingular matrix of integers, D/m is a nonzero vector in 
Z4 for any integer j > 0 and hence 


N-1 N-1 
NT! » e(m- AMtix) = NT! >, e(Dim- Ax) 30 as N= ov. 
n=0 n=0 


Adding these relations for 7 = 0, 1,...,q — 1 and dividing by q, we obtain 


Nq-1 
(Nq)"! Py e(m- A"x) 30 as N—- ov. 
n=0 


Since the sum of at most g terms e(m - A”x) has absolute value at most q it follows 
that, also without restricting N to be a multiple of g, 


N-1 
Nn! >5 e(m- A"x) > 0 as N > oo. 
n=0 


Hence, by Theorem 2’, x is normal with respect to A. 


Corollary 23 Let A be ad xd nonsingular integer matrix, no eigenvalue of which is a 
root of unity, and let B be ad x d integer matrix such that A? = B14 for some positive 
integers p,q. Then x € R¢ is normal with respect to A if and only if x is normal with 
respect to B. 


Proof This follows at once from Proposition 22, since the hypotheses imply that also 
B is nonsingular and has no eigenvalue which is a root of unity. 
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Brown and Moran (1993) have shown, conversely, that if A, B are commuting d xd 
nonsingular integer matrices, no eigenvalues of which are roots of unity, such that the 
set of all vectors normal with respect to A coincides with the set of all vectors normal 
with respect to B, then A? = B¢ for some positive integers p, q. 

These results will now be specialized to the scalar case. A real number x is said to 
be normal to the base a, where a is an integer > 2, if the sequence (a”x) is uniformly 
distributed mod 1. It is readily shown that x is normal to the base a if and only if, in 
the expansion of x to the base a: 


eae xia e/a + . 


where x; € {0,1,...,a— 1} foralli > 1 and x; = a — 1 for at most finitely many i, 
every block of digits occurs with the proper frequency; i.e., for any positive integer k 
and any aj,...,az € {0,1,...,a—1}, the number v(N) of i with 1 < i < N such that 


Xj = 41, Xj+1 = 42,...,Xitk-1 = Ak, 


satisfies o(N)/N — a7‘ as N > oo. By Proposition 21, almost all real numbers x 
are normal to a given base a. The original proof of this by Borel (1909) was a forerun- 
ner of Birkhoff’s ergodic theorem. (In fact Borel’s proof was faulty, but his paper was 
influential. Borel used a different definition of normal number, but Wall (1949) showed 
that it was equivalent to the definition in terms of uniform distribution adopted here.) 

The first published proof of the scalar case of Corollary 23 was given by Schmidt 
(1960), who also proved the scalar version of the result of Brown and Moran: the set 
of all numbers normal to the base a coincides with the set of all numbers normal to 
the base b, where a and b are integers > 2, if and only if a? = b% for some positive 
integers p, q. 

Although almost all real numbers are normal to every base a, it is still not 
known if such familiar irrational numbers as »/2; e or a are normal to some base. 
There are, however, various explicit constructions of normal numbers. In particular, 
Champernowne (1933) showed that the real number 0 whose expansion to the base 
10 is composed of the positive integers in their natural order, in other words, 
6 = 0.123456789101112..., is itself normal to the base 10. 


(iii) Let A be a set of finite cardinality r, which for definiteness we take to be the set 
{l,...,r}, and let p,,..., p,- be positive real numbers with sum 1. If Apo is the family 
of all subsets of the finite set A and if, for any Bp € Ap, we put wo(Bo) = uae Bi Da, 
then jo is a probability measure and (A, Apo, 0) is a probability space. 

Now let X be the set of all bi-infinite sequences x = (..., X-2, X-1, X0, X1, X2,---) 
with x; € A for every i € Z. Thus X is the product of infinitely many copies of A. We 
construct a product measure on X in the following way. 

For any finite sequence (d_m,...,d40,-.--,@m) With aj € A for -—m <i < m, 
define the (special) cylinder set [a_m, ..., 4m] of order m to be the set of all x € X 
such that x; = a; for —m < i < m. There are r2”t! distinct cylinder sets of order m, 
distinct cylinder sets are disjoint and X is the union of them all. 

Let G, denote the collection of all unions of distinct cylinder sets of order m. Thus 
X € G, and, if B € G», then BS = X\B e€ G,,. Moreover B,C € G, implies 
BUC e€@, and BNC € G,. If B € Gn, say 
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B = [d—m,..-,dm]U++-U [a’ ins er sills 
we define 


Lm(B) = Pa_m = * Pan + ee + Pal» a * Pay, 
Then fm (X) = 1, Mm(B) = 0 for every B € G,, and 
Lm(BU C) = Mm(B) + Um(C) if B,C € Gm and BNC =9. 


Every union of cylinder sets of order m is also a union of cylinder sets of order 
m + 1, since 


[ad_m,--+54m] = UU, aeala, a—m,-++,4m; a’). 


Thus Gm C Gn+1- Moreover m+1 continues “m, since 


; 
Lm-+1([a—ms .6+54m]) = > Pj Pj! Pa-m*** Pan 
i j/=1 


= Um([d-m, +++; omD(X pi) (> r’) 
j=l j= 


_ Lm (la—m, ++) Qm)). 


Let w denote the continuation of all uw, to@ = G@ UG, U....If B,C € @, then 
B,C € G, for some m. Hence, for given C € @, there are only finitely many distinct 
B €@ such that B C C. Consequently, if C is the union of a sequence of disjoint sets 
Crh € G(n = 1,2,...), then C, = @ for all large n and w(C) = >, 5) H(Cn). 
It follows, by a construction due to Carathéodory (1914), that 4 can be uniquely 
extended to the o-algebra & of subsets of X generated by @ so that (X, Z, w) isa 
probability space. For any ¢ > 0 there exists, for each B € 4, some C € @ such that 
M(BAC) <e. 

The two-sided Bernoulli shift Bp,,...,p, is the mapa: X > X defined by ox = X's 
where x; = x;41 for every i € Z. It is a measure-preserving transformation of the 
probability space (X, Z, wu), since 


oO [d-m,.--54m] = kamen (a a’, d—m,-+-,4m|] 


and hence 


2 
u(o~'[a—m; .++54m]) = > PjPj'!Pa-m*** Pan 


jj'=1 
- 
= > Pj pj U(la—-m +5 4m]) = U([a—m, .--, Am)). 
jj'=1 


The Bernoulli shift B,/2,1/2 is a model for the random process consisting of bi-infinite 
sequences of coin-tossings. 
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We may define the general cylinder set Cs where i1,..., 7% are distinct inte- 
gers, to be the set of all x € X such that 


Xi, = ,.--,Xi, = Ak. 
In particular, C? = o~'[a] and hence yu (C#) = pa. It follows by induction on k that 
MCA) = Pay Pac: 


Proposition 24 For any given positive numbers p\,..., pr with sum 1, the two-sided 
Bernoulli shift Bp,,....p, is ergodic. 


Proof Suppose B € Zand o~'B = B. For any « > 0 there exists a set C € @ such 
that 


H(BAC) = “(B\C) + K(C\B) <e. 
Then 


|u(B) — u(C)| = |e(C 9 B) + u(B\C) — u(C NB) — u(C\B)| 
< U(B\C) + U(C\B) <eé 


and hence 


|u(B)* — w(C)?| = {u(C) + u(B)} |u(B) — “(C)| < 2¢. 


We may suppose that C is the union of finitely many special cylinder sets of order m. 
Since 


—n — fe—ms++4n 
o"[d-m,--+,4m] = Cs ee m+n? 


forn > 2m we have 


y ! 
Gyre Ay > 4—m ++ 


/ / =n aad —m m 
[geste eg@ IVS [d-m,.-+5 4m] — C. TN, 6s: —M-EN,...5 m+n? 


and hence 


Hla! in» sats a] (1 as (ee dean Gm]) = Pa’, “t+ Pal, Pa-m*** Pam> 


= M(t pun dg ty dasa): 
It follows that ifm > 2m, then 
pChae "ey ney. 
But 


u(B\(C Na "C)) < 2u(B\C), 
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since 
B\(CNa-"C) € (B\C) U (B\o "C) € (B\C) Ua" (B\C), 

and similarly 
u((CNo—"C)\B) < 2u(C\B). 

Hence 

|u(B) — u(C No "C)| < H(B\(CN a "C)) + W((C Na “C)\B) < 2e. 
Thus 
0 < w(B) — n(BY = w(B) — w(C No "C) + w(C No “C) — wBY 

< 2e + u(C)? — w(B)? < 4e. 


Since ¢ is arbitrary, we conclude that u(B) = u(B)*. Hence uw(B) = 0 or 1, and a is 
ergodic. 


Similarly, if Y is the set of all infinite sequences y = (1, y2, y3,...) with yj € A 
for every i € N, then the one-sided Bernoulli shift BE ges Le. the mapt: Y > Y 
defined by ty = y’, where y; = y;41 foreveryi € N, is a measure-preserving transfor- 
mation of the analogously constructed probability space (Y, Z, 2). It should be noted 
that, although tY = Y, rc is not invertible. In the same way as for the two-sided shift, 


it may be shown that the one-sided Bernoulli shift By, _ p, 18 always ergodic. 


(iv) An example of some historical interest is the “continued fraction’ or Gauss map. 
Let X = [0, 1] be the unit interval and T: X — X the map defined (in the notation of 
81) by 


Té={é"} ée O,1), 


=0 if€ =Oorl. 
Thus T acts as the shift operator on the continued fraction expansion of ¢: if 
1 
E=[0;a1,a2,...]= —. 
ayt+ 


then T¢€ = [0; a2, a3, ...]. (in the terminology of Chapter IV, the complete quotients 
of € are 441 = 1/T"€.) 

It is not difficult to show that T is a measure-preserving transformation of the prob- 
ability space (X, Z, “), where F is the family of Borel subsets of X = [0, 1] and uw 
is the ‘Gauss’ measure defined by 


u(B) = (log2)7! [a bay dx. 


It may further be shown that T is ergodic. Hence, by Birkhoff’s ergodic theorem, if f 
is an integrable function on the interval X then, for almost all ¢ € X, 


n—-1 
im x > fa"e)= (log2y7! [ f(x) +x)" dx. 
xX 


n—- oo 
k=0 
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Here it makes no difference if ‘integrable’ and ‘almost all’ refer to the invariant 
measure ju or to Lebesgue measure, since 1/2 < (1+x)7! <1. 

Taking f to be the indicator function of the set {¢ € X: a; = m}, we see that 
the asymptotic relative frequency of the positive integer m among the partial quotients 
a\,a2,...18 almost always 


-1 


(log2)"! [ " yall $0) dy = (02) logon + 1? /0m(on +2) 
m+1)7! 


It follows, in particular, that almost all € € X have unbounded partial quotients. 
Again, by taking f(¢) = log é it may be shown that, for almost all € € X, 


jim (1/n) log qn(€) = 27/(12 log2), 


where q,(€) is the denominator of the n-th convergent pp /gn of €. This was first proved 
by Lévy (1929). 

In a letter to Laplace, Gauss (1812) stated that, for each x € (0, 1), the proportion 
of € € X for which T”€ < x converges as n > oo to log(1 + x)/(log 2) and he asked 
if Laplace could provide an estimate for the rapidity of convergence. If one writes 


Tn(X) = my (x) — log( + x)/(log 2), 


where m,,(x) is the Lebesgue measure of the set of all € € X such that T’€ < x, then 
Gauss’s statement is that r;, (x) — 0 as n — oo and his question is, how fast? 

Gauss’s statement was first proved by Kuz’ min (1928), who also gave an estimate 
for the rapidity of convergence. If one regards Gauss’s statement as a proposition in 
ergodic theory, then one needs to know that T is not only ergodic but even mixing, 1.e. 
forall A, Be &, 


M(T"ANB) => wA)u(B) asn-> oc. 


Kuz’min’s estimate r,(x) = O(qv") for some q € (0, 1) was improved by Lévy 
(1929) and Sziisz (1961) to r,(x) = O(q”) with g = 0.7 and q = 0.485 respec- 
tively. A substantial advance was made by Wirsing (1974). By means of an infinite- 
dimensional generalization of a theorem of Perron (1907) and Frobenius (1908) on 
positive matrices, he showed that 


r(x) = (—A)" w(x) + OC — x)u"), 


where y is a twice continuously differentiable function with y(0) = w(1) = 0, 
0 < yw <dAandd = 0.303663 .... Wirsing’s analysis has been extended by Babenko 
(1978) and Mayer (1990). 


(v) Suppose we are given a system of ordinary differential equations 
dx/dt = f(x), (1) 


where x € R@ and f: R¢ > R¢ is a continuously differentiable function. Then, for 
any x € R¢, there is a unique solution g; (x) of (+) such that go(x) = x. 


4 Applications 481 


Suppose further that there exists an invariant region X C R¢. That is, X is the 
closure of a bounded connected open set and x € X implies g;(x) € X. Then the 
map 7}: X > X given by 7T;x = g;(x) is defined for every t € R and satisfies 
Tr4sx = T;(Tsx). 

Suppose finally that divf = 0 for every x € R%, where x = (x1,..., xq), 


f=(h...., fa) and 


d 
divf := >) Ofk/Oxx. 


k=1 


Then, by a theorem due to Liouville, the map 7; sends an arbitrary region into a region 
of the same volume. (For the statement and proof of Liouville’s theorem see, for exam- 
ple, V.I. Arnold, Mathematical methods of classical mechanics, Springer-Verlag, New 
York, 1978.) It follows that if A is the family of Borel subsets of X and “~ Lebesgue 
measure, normalized so that “(X) = 1, then 7; is a measure-preserving transformation 
of the probability space (X, F, uu). 

An important special case is the Hamiltonian system of ordinary differential equa- 
tions 


dp;/dt = —0H/oqi, dq;/dt=0H/dp; (=1,...,n), 


where H(p1,.-.-, Pns41;--+>9n) iS a twice continuously differentiable real-valued 
function. The divergence does indeed vanish identically in this case, since 


n n 
— 6 PH /apidgi + DP H/eqidpi = 0. 
i=l i=l 
Furthermore, for any / € R, the energy surface X: H (p, q) = h is invariant, since 
n n 
dH[p(t), q(t)\/dt = >° 0H /dpi(—0H /dqi) + >| OH /0qi0H /dp; = 0. 
i=l i=l 


It is not difficult to show that if o is the volume element on X induced by the Euclidean 
metric || || on R2”, and if 


VH = (0H/ap,...,0H/dpn, 0H /dqi, ..., 0H /dGn) 


is the gradient of H, then the maps 7; preserve the measure uw on X defined by 


wey= ff do /||V HI. 


If X is compact, this measure can be normalized and we obtain a family of measure- 
preserving transformations T;(t € R) of the corresponding probability space. 


(vi) Many problems arising in mechanics may be reduced by a change of variables to 
the geometric problem of geodesic flow. If M is a smooth Riemannian manifold then 
the set of all pairs (x, 0), where x € M and 0 is a unit vector in the tangent space to 
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M at x, can be given the structure of a Riemannian manifold, the unit tangent bundle 
T, M. Evidently 7| M is a (2n — 1)-dimensional manifold if M is n-dimensional. There 
is a natural measure “ on 7;M such that du = dvg dwg, where dv, is the volume 
element at q of the Riemannian manifold M and @, is Lebesgue measure on the unit 
sphere S”—! in the tangent space to M at x. If M is compact, then the measure j can 
be normalized so that (7; M) = 1. 

A geodesic on M is acurve y C M such that the length of every curve in M 
joining a point x € y to any sufficiently close point y € y is not less than the length 
of the arc of y which joins x and y. Given any point (x, v) € 7;M, there is a unique 
geodesic passing through x in the direction of v. The geodesic flow on 7M is the 
flow g;: T;M — TM defined by g;(x,v) = (a, v7), where x; is the point of M 
reached from x after time ft by travelling with unit speed along the geodesic deter- 
mined by (x, v) and v; is the unit tangent vector to this geodesic at x;. If M is compact 
then, for every real f, y; is defined and is a measure-preserving transformation of the 
corresponding probability space (7| M, &, u). 

The geodesics on a compact 2-dimensional manifold M whose curvature at each 
point is negative were profoundly studied by Hadamard (1898). It was first shown by 
E. Hopf (1939) that in this case g; is ergodic for every t > 0. (We must exclude 
t = 0, since go is the identity map.) This result has been considerably generalized by 
Anosov (1967) and others. In particular, the geodesic flow on a compact n-dimensional 
Riemannian manifold is ergodic if at each point the curvature of every 2-dimensional 
section is negative. 

Although the preceding examples look quite different, some of them are not 
‘really’ different, i.e. apart from sets of measure zero. More precisely, if (X1, A1, 11) 
and (X2, 42, 2) are probability spaces with measure-preserving transformations 
T,: X; > X, and 72: X2 > X2, we say that 7) is isomorphic to T> if there exist sets 
x E Bi, Xi} € Zz with H1(X}) =, H2(X5) =land7,X'C Xs T,X}, Cc Xi}, and 
a bijective map g of X{ onto X4, such that 


(i) for any By C Xi: By € A, if and only if g(B}) € # and then w)(B,) = 


H2(p(B1)); 
(ii) g(T1x) = Thg(x) for every x € X'. 


For example, it is easily shown that the Bernoulli shift B,,,...p, is isomorphic 
to the following transformation of the unit square, equipped with Lebesgue measure. 
Divide the square into r vertical strips of width p1,..., p,; then contract the height of 
the i-th strip and expand its width so that it has height p; and width 1; finally combine 
these rectangles to form the unit square again by regarding them as horizontal strips 
of height pj,..., p-. (For r = 2 and p; = p2 = 1/2, this transformation of the unit 
square is allegedly used by bakers when kneading dough.) 

It is easily shown also that isomorphism is an equivalence relation and that it 
preserves ergodicity. However, it is usually quite difficult to show that two measure- 
preserving transformations are indeed isomorphic. A period of rapid growth was ini- 
tiated with the definition by Kolmogorov (1958), and its practical implementation by 
Sinai (1959), of a new numerical isomorphism invariant, the entropy of a measure- 
preserving transformation. For the formal definition of entropy we refer to the texts on 
ergodic theory cited at the end of the chapter. Here we merely state its value for some 
of the preceding examples. 
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Any translation T, of the torus R¢/Z? has entropy zero, whereas the endomor- 
phism R, of R¢/Z4 has entropy 


>) log iail, 


i: |Ail>1 


where 21,..., Aq are the eigenvalues of the matrix A and the summation is over those 
of them which lie outside the unit circle. 


The two-sided Bernoulli shift Bp, ,.p, has entropy 


ees, 


: 
= p; log pj, 
j=l 


and the entropy of the one-sided Bernoulli shift Bo ccs p, 18 given by the same formula. 
It follows that By/2,1/2 is not isomorphic to B1/3,1/3,1/3, since the first has entropy 
log 2 and the second has entropy log 3. Ornstein (1970) established the remarkable re- 
sult that two-sided Bernoulli shifts are completely classified by their entropy: Bp,.,..., p, 
is isomorphic to Bg,,....q, if and only if 


r Ss 
— >) pj log pj = — Do ak log ae. 
j=l k=1 


sia 


This is no longer true for one-sided Bernoulli shifts. Walters (1973) has shown that 
BR ccs is isomorphic to Bcd if and only ifr = s and q\,..., gs 18 a permutation 
of P1,..-, Pr- 

The Gauss map 7x = {x~!} has entropy 7/6 log 2. Although it is mixing, it is 
not isomorphic to a Bernoulli shift. 

Katznelson (1971) showed that any ergodic automorphism of the torus R¢/Z? is 
isomorphic to a two-sided Bernoulli shift, and Lind (1977) has extended this result to 
ergodic automorphisms of any compact abelian group. 

Ornstein and Weiss (1973) showed that, if g; is the geodesic flow on a smooth 
(of class C3) compact two-dimensional Riemannian manifold whose curvature at each 
point is negative, then gy; is isomorphic to a two-sided Bernoulli shift for every t > 0. 
Although, as Hilbert showed, a compact surface of negative curvature cannot be imbed- 
ded in R?, the geodesic flow on a surface of negative curvature can be realized as the 
motion of a particle constrained to move on a surface in R? subject to centres of at- 
traction and repulsion in the ambient space. The isomorphism with a Bernoulli shift 
shows that a deterministic mechanical system can generate a random process. Thus 
philosophical objections to ‘Laplacian determinism’ or to “God playing dice’ do not 
seem to have much point. 


5 Recurrence 


It was shown by Poincaré (1890) that the paths of a Hamiltonian system of differential 
equations almost always return to any neighbourhood, however small, of their initial 
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points. Poincaré’s proof was inevitably incomplete, since at the time measure theory 
did not exist. However, Carathéodory (1919) showed that his argument could be made 
rigorous with the aid of Lebesgue measure: 


Proposition 25 Let T: X — X be a measure-preserving transformation of the prob- 
ability space (X, B, pw). Then almost all points of any B € & return to B infinitely 
often, i.e. for eachx € B, apartfrom a set of 4-measure zero, there exists an increasing 
sequence (nx) of positive integers such that T™*x € B (k =1,2,...). 

Furthermore, if 4(B) > 0, then u(BOT~"B) > 0 for infinitely many n > 1. 


Proof For any N > 0, put By = U,sy) T~"B. Then 


A:= ‘a Bn 
N>0 


is the set of all points x € X such that T”x € B for infinitely many positive integers 
n. Since By+1 = T—! Bn, we have (By +1) = (By) and hence “(By) = u(Bo) 
for all N > 1. Since By+1 C By, it follows that 


H(A) = lim (By) = u(Bo). 
N->0o 

Since A C Bo, this implies 

H(Bo\A) = (Bo) — H(A) = 0 
and hence, since B C Bo, “(B\A) = 0. 

This proves the first statement of the proposition. If “(B 1 T~"B) = 0 for all 
n >m, then u(BNM By) = 0 for all N > m and hence 
M(BOA)= lim w(BNM By) = 0. 
Noo 

Consequently 


H(B) = u(B\A) + u(BN A) =0, 


which proves the second statement of the proposition. 


Furstenberg (1977) extended Proposition 25 in the following way: 

Let T be a measure-preserving transformation of the probability space (X, &, LM). 
If B € Bwith u(B) > O and if p > 2, then u(BOAT-"BN---VT~P-)"B) > 0 
for somen > 1. 

His proof of this theorem made heavy use of ergodic theory and, in particular, 
of a new structure theory for measure-preserving transformations. From his theorem 
he was able to deduce quite easily a result for which Szemeredi (1975) had given a 
complicated combinatorial proof: 

Let S be a subset of the set N of positive integers which has positive upper density; 
i.e., for some a € (0, 1), there exist arbitrarily long intervals I C N containing at least 
a|I| elements of S. Then S contains arithmetic progressions of arbitrary finite length. 
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Furstenberg’s approach to this result is not really shorter than Szemeredi’s, but it is 
much more systematic. In fact the following generalization of Furstenberg’s theorem 
was given soon afterwards by Furstenberg and Katznelson (1978): 

If T),..., Tp are commuting measure-preserving transformations of the probabil- 
ity space (X, B, w) and if B « B with u(B) > 0, then w(BNT,"BN---AT,"B) > 0 
for infinitely many n > 1. 

Furstenberg and Katznelson could then deduce quite easily a multi-dimensional 
extension of Szemeredi’s theorem which is still beyond the reach of combinatorial 
methods. Szemeredi’s theorem was itself a far-reaching generalization of a famous 
theorem of van der Waerden (1927): 

IfN = S, U---U S;, is a partition of the set of all positive integers into finitely 
many subsets, then one of the subsets S; contains arithmetic progressions of arbitrary 
finite length. 

Szemeredi’s result further indicates how the subset $; should be chosen. 

Poincaré’s measure-theoretic recurrence theorem has a topological counterpart due 
to Birkhoff (1912): 

If X is acompact metric space and T: X — X acontinuous map, then there exists 
a point z € X andan increasing sequence (nx) of positive integers such that T"*z > z 
ask + ©. 

Before Furstenberg and Katznelson proved their measure-theoretic theorem, 
Furstenberg and Weiss (1978) had already proved its topological counterpart: 


If X is a compact metric space and T,,..., Tp commuting continuous maps of X 
into itself, then there exists a point z € X and an increasing sequence (nx) of positive 
integers such that sie 4 > zask>owi=l.,..., p). 


From their theorem Furstenberg and Weiss were able to deduce quite easily both 
van der Waerden’s theorem and a known multi-dimensional generalization of it, due 
to Grunwald. It would take too long to prove here Szemeredi’s theorem by the method 
of Furstenberg and Katznelson, but we will prove van der Waerden’s theorem by the 
method of Furstenberg and Weiss. The proof illustrates how results in one area of 
mathematics can find application in another area which is apparently unrelated. 


Proposition 26 Let (X,d) be a compact metric space and T: X — X a continuous 
map. Then, for any real ¢ > 0 and any p € N, there exists some z € X andn € N 
such that 


d(T"z, z) <6, dre. z)<eé,...,d(T?"z, z) <e. 


Proof (i) A subset A of X is said to be invariant under T if TA C A. The closure A of 
an invariant set A is again invariant since, by the continuity of 7, 7A C TA. Let F be 
the collection of all nonempty closed invariant subsets of X. Clearly is not empty, 
since X € #. If we regard F as partially ordered by inclusion then, by Hausdorff’s 
maximality theorem, contains a maximal totally ordered subcollection .7. The in- 
tersection Z of all the subsets in 7 is both closed and invariant. It is also nonempty, 
since X is compact. Hence Z € Y and, by construction, no nonempty proper closed 
subset of Z is invariant. 

By replacing X by its compact subset Z we may now assume that the only closed 
invariant subsets of X itself are X and 9. 
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(ii) For any given z € X, the closure of the set (T”"z),>1 is a nonempty closed in- 
variant subset of X and therefore coincides with X. Thus for every ¢ > 0 there exists 
n = n(e) > 1 such that d(T”z, z) < ¢. This proves the proposition for p = 1. 

We suppose now that p > | and the proposition holds with p replaced by p — 1. 


(iii) We show next that, for any ¢ > 0, there exists a finite set K of positive integers 
such that, for all x, x’ € X, 


d(T*x’,x) <e/2 forsomek e K. 


If B is anonempty open subset of X, then for every z € X there exists somen > 1 
such that T"z ¢ B. Hence X = U,,,, T~"B. Since X is compact and the sets T~" B 
are open, there is a finite set K (B) of positive integers such that 


x= |) gs 
keK(B) 


Since X is compact again, there exist finitely many open balls B,,..., B, with radius 
é/4 such that X = By U---U B,. If x, x’ € X, then x © B; for somei € {1,...,r} 
and x’ € T~*B; for some k € K(B;). Thus we can take K = K(B,) U---U K(B,). 


(iv) We now show that, for any ¢ > O and any x € X, there exists y € X andn > 1 
such that 


dr yay ee als) Sey APs) Se 


In fact, since each T*(k € K) is uniformly continuous on X, we can choose p > 0 
so that d(x1, x2) < p implies d(T x1, T*x2) < ¢/2 forall x1,x2 € X andallke K. 
By the induction hypothesis, there exist x’ € X andn > 1 such that 


aT" x’ .2) = py, Ts 2) <p. 


But the invariant set TX is closed, since X is compact, and so TX = X. Hence 
T"X = X and we can choose y’ € X so that T” y’ = x’. Thus 


d(T" y’,x')=0, d(T" y’,x')<p,..., d(TP"y',x') <p. 
It follows that, for allk € K, 
dE yl Te) 22 ci MTP Py Te) Sef. 
For each x € X there isak € K such that d(T*x’, x) < ¢/2. Thus if y = Ty’, then 
d(T" y,x) <e,...,d(TP"y, x) <a. 
(v) Let e¢9 > 0 and xo € X be given. By (iv) there exist x; € X and, > 1 such that 
d(T"! x1,x09) < €0,..., d(T?" x1, x0) < €0. 
We can now choose €; € (0, €9) so that d(x, x1) < €1 implies 


d(T"! x, x0) < €0,...,d(T?"'x, x0) < €0. 
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Suppose we have defined points x1,..., xx, positive integers mj,...,nx, and 
€1,...,€k € (0, €9) such that, fori = 1,...,k, 
qd is im) Seige Tag ei) Sais 


and d(x, x;) < e; implies 
d(T" x, xj-1) < €j-1,..., A(T?" x, xj-1) < €j-1. 
By (iv) there exist xz41 € X and ng41 > 1 such that 


d(T"! xp41, Xk) < ky ee (TP xR 1, XK) < Ek, 


and we can then choose ex41 € (0, €9) so that d(x, x741) < €x41 implies 
d(T"! x, x) < ex, ..., A(T P+ x, XK) < eK. 


Thus the process can be continued indefinitely. 
By taking successively i = j — 1, j —2,... we see that, ifi < j, then 


APP fee Bis pee el Te aT ga) See, 


Since X is compact, it is covered by a finite number r of open balls with radius ¢9/2. 
Hence there exist 7, 7 withO < i < j < r such that d(x;,x;) < eo. If we put 
n=Nnj41+---+nj-1+n; then, since ¢; < €0, we obtain from the triangle inequality 


d(T" xj, %j) < 260; ...,d(7" x}, x7) < 2a. 


But eo > O was arbitrary. 


It may be deduced from Proposition 26, by means of Baire’s category theorem, 
that under the same hypotheses there exists a point z € X and an increasing sequence 
(nx) of positive integers such that Ti%*z > zask > oo (fi =1,..., Pp). However, as 
we now show, Proposition 26 already suffices to proves van der Waerden’s theorem. 

The set X* of all infinite sequences x = (x1,x2,...), where x; € {1,2,...,7r} 
for every i > 1, can be given the structure of a compact metric space by defining 
d(x, x) = 0 and d(x, y) = 2~* if x # y and k is the least positive integer such that 
xXx # ye. The shift map t: X* — X*, defined by t((x1, x2,...)) = (x2, X3,...), iS 
continuous, since 


d(z(x), t(y)) < 2d(x, y). 


With the partition N = S$; U---U S$; in the statement of van der Waerden’s theorem 
we associate the infinite sequence x € X* defined by x; = j ifi € Sj. 

Let X denote the closure of the set (t”x),>1. Then X is a closed subset of X* 
which is invariant under t. By Proposition 26, there exists a point z € X and a positive 
integer n such that 


d(r"z,z) < 1/2, d(r7"z, z) <1/2,...,  d(r?"z,z) < 1/2; 
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Le. Z1 = Zn41 = 22nt1 =-+* = Zpn4i. Since z € X, there is a positive integer m such 
that d(r’”x,z) < pn! ve. Xm+i = zi for 1 <i < pn +1. It follows that 


Xm+1 = Xm+n+1 = °** = Xm+pn+i- 


Thus for every positive integer p there is a set S;(») which contains an arithmetic 
progression of length p. Since there are only r possible values for j (p), one of the sets 
S; must contain arithmetic progressions of arbitrary finite length. 

A far-reaching generalization of van der Waerden’s theorem has been given by 
Hales and Jewett (1963). Let A = {a1,..., aq} be a finite set and let A” be the set 
of all n-tuples with elements from A. A set W = {w!,..., w2} C A” of q n-tuples 
wk = (wf sesiniialy wk) is said to be a combinatorial line if there exists a partition 


{l,....2}=IUS, INJ=G, 


such that 


we =a, (k=1,...,¢) foriel; ww} == wh for j € J. 

The Hales—Jewett theorem says that, for any positive integer r, there exists a posi- 
tive integer N = N(q,r) such that, if AY is partitioned into r classes, then at least one 
of these classes contains a combinatorial line. 

If one takes A = {0, 1,..., ¢—1} and interprets A” as the set of expansions to base 
q of all non-negative integers less than q”, then a combinatorial line is an arithmetic 
progression. On the other hand, if one takes A = F, to be a finite field with g elements 
and interprets A” as the n-dimensional vector space F”, then a combinatorial line is 
an affine line. The interesting feature of the Hales—Jewett theorem is that it is purely 
combinatorial and does not involve any notion of addition. 


6 Further Remarks 


Uniform distribution and discrepancy are thoroughly discussed in Kuipers and Nieder- 
reiter [30]. For later results, see Drmota and Tichy [13]. Since these two books have 
extensive bibliographies, we will be sparing with references. However, it would be re- 
miss not to recommend the great paper of Weyl [52], which remains as fresh as when 
it was written. 

Lemma 0 is often attributed to Polya (1920), but it was already proved by Buchanan 
and Hildebrandt [9]. 

Fejér’s proof that continuous periodic functions can be uniformly approximated by 
trigonometric polynomials is given in Dym and McKean [15]. The theorem also fol- 
lows directly from the the theorem of Weierstrass (1885) on the uniform approximation 
of continuous functions by ordinary polynomials. A remarkable generalization of both 
results was given by Stone (1937); see Stone [49]. The ‘Stone—Weierstrass theorem’ is 
also proved in Rudin [44], for example. 

Chen [11] gives a quantitative version of Kronecker’s theorem of a different type 
from Proposition 3’. 
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The converse of Proposition 10 is proved by Kemperman [27]. For the history of 
the problem of mean motion, and generalizations to almost periodic functions, see 
Jessen and Tornehave [24]. Methods for estimating exponential sums were developed 
in connection with the theory of uniform distribution, but then found other applica- 
tions. See Chandrasekharan [10] and Graham and Kolesnik [21]. 

For applications of discrepancy to numerical integration, see Niederreiter [36, 37]. 
For the basic properties of functions of bounded variation and the definition of total 
variation see, for example, Riesz and Sz.-Nagy [42]. 

Sharper versions of the original Erdés—Turan inequality are proved by Niederreiter 
and Philipp [38] and in Montgomery [35]. The discrepancy of the sequence ({na}), 
where a@ is an irrational number whose continued fraction expansion has bounded 
partial quotients (i.e., is badly approximable), is discussed by Dupain and Sos [14]. 
The discrepancy of the sequence ({na}), where a € R“, has been deeply studied by 
Beck [3]. The work of Roth, Schmidt and others is treated in Beck and Chen [4]. 

For accounts of measure theory, see Billingsley [6], Halmos [22], Loéve [32] 
and Saks [46]. More detailed treatments of ergodic theory are given in the books of 
Petersen [39], Walters [51] and Cornfeld et al. [12]. The prehistory of ergodic theory 
is described by the Ehrenfests [16]. However, they do not refer to the paper of Poincaré 
(1894), which is reproduced in [41]. 

The proof of Birkhoff’s ergodic theorem given here follows Katznelson and 
Weiss [26]. A different proof is given in the book of Walters. 

Many other ergodic theorems besides Birkhoff’s are discussed in Krengel [29]. We 
mention only the subadditive ergodic theorem of Kingman (1968): if T is a measure- 
preserving transformation of the probability space (X, #, w) and if (gy) is a sequence 
of functions in L(X, Y, ) such that inf, n~! fe 8ndu > —oo and, forall m,n > 1, 


8n+m(xX) < gn(x) + 8m(T"x) a.c., 


then n—!g,(x) > g*(x) ae., where g*(Tx) = g*(x) ae., g* € L(X, F, uw) and 


[ eau= lim ae) gdje = int! | 2n du. 
X n—- oo X n XxX 


Birkhoff’s ergodic theorem may be regarded as a special case by taking g,(x) = 
= f(T*x). A simple proof of Kingman’s theorem is given by Steele [48]. For 
applications of Kingman’s theorem to percolation processes and products of random 
matrices, see Kingman [28]. The multiplicative ergodic theorem of Oseledets is de- 
rived from Kingman’s theorem by Ruelle [45]. 
The book of Kuipers and Niederreiter cited above has an extensive discussion of 
normal numbers. For normality with respect to a matrix, see also Brown and Moran [8]. 
Proofs of Gauss’s statement on the continued fraction map are contained in the 
books by Billingsley [7] and Rockett and Szusz [43]. For more recent work, see 
Wirsing [53], Babenko [2] and Mayer [33]. For the deviation of (1/n) log qn(€) 
from its (a.e.) limiting value z*/(12log2) there are analogues of the central limit 
theorem and the law of the iterated logarithm; see Philipp and Stackelberg [40]. For 
higher-dimensional generalizations of Gauss’s invariant measure, see Hardcastle and 
Khanin [23]. 
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Applications of ergodic theory to classical mechanics are discussed in the books of 
Arnold and Avez [1] and Katok and Hasselblatt [25]. For connections between ergodic 
theory and the ‘3x + 1 problem’, see Lagarias [31]. 

Ergodic theory has been used to generalize considerably some of the results on lat- 
tices in Chapter VIII. A Jattice in a locally compact group G is a discrete subgroup I” 
such that the G-invariant measure of the quotient space G/TJ is finite. (In Chapter VIII, 
G = R" and I’ = Z”.) Zimmer [54] gives a good introduction to the results which 
have been obtained in this area. 

An attractive account of the work of Furstenberg and his collaborators is given in 
Furstenberg [17]. See also Graham et al. [20] and the book of Petersen cited above. 
The discovery of van der Waerden’s theorem is described in van der Waerden [50]. For 
a recent direct proof, see Mills [34]. 

The direct proofs reduce the theorem to an equivalent finite form: for any positive 
integer p, there exists a positive integer N such that, whenever the set {1,2,..., N} 
is partitioned into two subsets, at least one subset contains an arithmetic progression 
of length p. The original proofs provided an upper bound for the least possible value 
N(p) of N, but it was unreasonably large. Some progress towards obtaining reasonable 
upper bounds has recently been made by Shelah [47] and Gowers [19]. 

The Hales—Jewett theorem is proved, and then extensively generalized, in 
Bergelson and Leibman [5]. Furstenberg and Katznelson [18] prove a density ver- 
sion of the Hales—Jewett theorem, analogous to Szemeredi’s density version of van der 
Waerden’s theorem. 
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XII 


Elliptic Functions 


Our discussion of elliptic functions may be regarded as an essay in revisionism, since 
we do not use Liouville’s theorem, Riemann surfaces or the Weierstrassian functions. 
We wish to show that the methods used by the founding fathers of the subject provide 
a natural and rigorous approach, which is very well suited for applications. 

The work is arranged so that the initial sections are mutually independent, although 
motivation for each section is provided by those which precede it. To some extent we 
have also separated the discussion for real and for complex parameters, so that those 
interested only in the real case may skip the complex one. 


1 Elliptic Integrals 


After the development of the integral calculus in the second half of the 17th century, 
it was natural to apply it to the determination of the arc length of an ellipse since, by 
Kepler’s first law, the planets move in elliptical orbits with the sun at one focus. 

An ellipse is described in rectangular coordinates by an equation 


x?/a? + y*/b° = 1, 


where a and b are the semi-axes of the ellipse (a > b > 0). It is also given parametri- 
cally by 


x=asind, y=bcosO (0<@<2z),. 


The arc length s(@) from 6 = 0 to 6 = @ is given by 
(C) 
(0) = | ax/a6y? + (ay/ao? "a9 
0 
(C) 
= 7 (a* cos” 0 + b? sin? 0)'/*d0 
0 
(C) 
=i [a? — (a? — b’) sin’ 9]!" 0. 
0 
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If we put b* = a*(1 — k*), where k (0 < k < 1) is the eccentricity of the ellipse, this 
takes the form 


(C) 
s(O) =a (1 — k? sin? 9)!/2a@. 
0 


If we further put z = sin@ = x/a and restrict attention to the first quadrant, this 
assumes the algebraic form 


vA 
: | [a — R22) — 22))!/2az. 
0 


Since the arc length of the whole quadrant is obtained by taking Z = 1, the arc length 
of the whole ellipse is 


1 
bade | [a — 2) /(1 — 22))"2ad, 
0 


Consider next Galileo’s problem of the simple pendulum. If @ is the angle of de- 
flection from the downward vertical, the equation of motion of the pendulum is 


d°0/dt” + (g/l) sind = 0, 


where / is the length of the pendulum and g is the gravitational constant. This differ- 
ential equation has the first integral 


(dO /dt)* = (2g/1)(cos@ — a), 


where a is a constant. In fact a < 1 for a real motion, and for oscillatory motion we 
must also have a > —1. We can then put a = cosa (0 < a < z), where a is the 
maximum value of 6, and integrate again to obtain 


2) 
c= Gj2g)'? f (cos@ — cosa)~'/*d@ 
0 
CJ 
= 4g)? | (sin? a/2 — sin? 6/2)~'/7d6. 
0 
Putting k = sina/2 and kx = sin@/2, we can rewrite this in the form 


xX 
t= (/g)!? i [U — x2) — x2) Yar. 
0 


The angle of deflection @ attains its maximum value a when X = 1, and the motion is 
periodic with period 


1 
i =40/=)'? | Pi kee el a yd, 
0 


Attempts to evaluate the integrals in both these problems in terms of algebraic and 
elementary transcendental functions proved fruitless. Thus the idea arose of treating 
them as fundamental entities in terms of which other integrals could be expressed. 
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An example is the determination of the arc length of a lemniscate. This curve, 
which was studied by Jacob Bernoulli (1694), has the form of a figure of eight and is 
the locus of all points z € C such that |2z* — 1] = 1 or, in polar coordinates, 


r?=cos20 (—17/4< 0 < 2/4U32/4 <0 < 52/4). 
If —z/4 < © < 0, the arc length s(@) from 0 = —7/4 to 0 = @ is given by 


(2) 
s(0) = l [r? + (dr /d6)7]'/"d0 
—1/4 


(Cc) 
=[ [r? + (1 —r*)/r7]'/2d0 


-[ (l—r4)7!/?ar. 


If we make the change of variables x = V2r/( + r7)!/?, then on account of 
dx/dr = J2/(1+r?)3/2 we obtain 


XxX 
s(O) oy [1 —x?/2)0 —x)]-dx. 
0 


Another example is the determination of the surface area of an ellipsoid. Suppose 
the ellipsoid is described in rectangular coordinates by the equation 


Pla +y/bP+2/ce=1, 


where a > b > c > O. The total surface area is 8.8, where S is the surface area of the 
part contained in the positive octant. In this octant we have 


= c[1 — (x/a)* — (y/b)*]'" 


and hence 
1 + (62/8x)? + (62/dy)? = [1 — (ax/a)* — (By/by1/ — @/ay’ — (y/b)’1, 
where 
a=(a’—c)'?/a, B= (b?-c’)'/?/b. 


Consequently 
b(—(x/a)?)!/ 
Sy [1 = (ax/ay? — (By/b)*U1 — (w/a)? = (9/bY- "aya. 
0 JO 
If we make the change of variables 


x =arcosé, y=brsiné, 


with Jacobian J = abr, we obtain 


a /2 1 
s= ab | ao | {i= ory = 7)! rdr, 
0 0 
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where 
ao =a’ cos’ + B’ sin’ 6. 
If we now put 
uw? = (1—r?)/(1—ar’), 
then r? = (1 — u”)/(1 — ou?) and 
rdr/du=—(1—o)u/(A —- ou)’. 


Hence 


x /2 1 
S= ab | ao | (1 —o)( —ou’)~*du. 
0 0 
Inverting the order of integration and giving o its value, we obtain 
1 x /2 
S= ab | au | [(1 — a) cos” 6 + (1 — £) sin? 0] 
0 0 
x [(1 — au?) cos? 6 + (1 — Bu”) sin? 0-70. 
It is readily verified that 
a /2 
| cos” 0(m cos* 6 +n sin? 0)~7d0 = x /4m(mn)!/?, 
0 
a /2 
| sin? O(m cos’ 6 +nsin? 0)~7d0 = x /4n(mn)'/?, 
0 
Thus we obtain finally 
1 
$= (wab/a) [1 = a2) = a2) + = Y= B29] 
0 


x [C1 — au?) (1 — B2u?))7 du. 


By an elliptic integral one understands today any integral of the form 


[Ro wyas, 


where R(x, w) is a rational function of x and w, and where w* = g(x) is a polynomial 
in x of degree 3 or 4 without repeated roots. The elliptic integral is said to be complete 
if it is a definite integral in which the limits of integration are distinct roots of g(x). 

The case of a quartic is easily reduced to that of a cubic. In the preceding examples 
we can simply put y = x?. Thus, for the lemniscate, 


Ng 
s(0) = ey [4y(1 — y)(1 — y/2)1- "ay. 
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In general, suppose g(x) = (x — a)h(x), where h is a cubic. If 
h(x) = ho(x — a) +hy(x- a)? t+ho(x -—a)+h3 
and we make the change of variables x = a + 1/y, then g(x) = g*(y)/y*, where 
g*(y) =ho+hiy + hay? +hsy°, 


and 
J Ro. wyax= [ RG. 0Vdy, 


where R*(y, v) is a rational function of y and v, and v* = g*(y). 

Since any even power of w is a polynomial in x, the integrand can be written in 
the form R(x, w) = (A + Bw)/(C + Dw), where A, B, C, D are polynomials in x. 
Multiplying numerator and denominator by (C — Dw)w, we obtain 


R(x, w) = N/L+M/Lwu, 


where L, M, N are polynomials in x. By decomposing the rational function N/L into 
partial fractions its integral can be evaluated in terms of rational functions and (real or 
complex) logarithms. By similarly decomposing the rational function M/L into partial 
fractions, we are reduced to evaluating the integrals 


t= | dx/w, n= | x"dx/w, Inv) = f (= 7) "dx/w, 


wheren € Nandy €C. 
The argument of the preceding paragraph is actually valid if w* = g is any poly- 
nomial. Suppose now that g is a cubic without repeated roots, say 


g(x) = agx? + ayx? + anx +43. 
By differentiation we obtain, for any integer m > 0, 
(x wy = mx"! w +x" 9'/2w = (2mx"—!g + x""g')/2w. 
Since the numerator on the right is the polynomial 
(2m + 3)agx™*? + (2m + 2)ayx"t! + (2m + l)anx™ + 2ma3x™"!, 
it follows on integration that 
2x" w = (2m + 3)aolm42 + (2m + 2)atIna4i + (2m + l)aoIn + 2mazIn-1.- 
It follows by induction that, for each integer n > 1, 
In = Pa(x)w + calo+ enh, 


where p(x) is a polynomial of degree n — 2 and cy, c!, are constants. Thus the evalu- 
ation of J, forn > 1 reduces to the evaluation of Jo and J). 
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Consider now the integral J, (y ). In the same way as before, for any integer m > 1, 

d(x — yyw} /dx = —m(x =p)" w + (x = yg" /2 
= {-2mg + (x — y)g'}/20(x — 7)". 
We can write 
B(x) = bo + bie — y) + bax — y)° + bax — 9)? 

and the numerator on the right of the previous equation is then 

—2mbo + (1 — 2m)by (x — y) + (2 — 2m)bo(x — y)? + B — W)bs(x — 7)? 
It follows on integration that 


2x — yy)" w = —2mboJine1(y) + A — 2m)b1 Jin (y) 
+ (2 a 2m)b2Jm—1(y ) + (3 _ 2m)b3 Jm—2(y is 


where J_1(y) = {(x — y) dx/w is a constant linear combination of Jo and 1;. Since 
g does not have repeated roots, b} £ 0 if bop = 0. 
It follows by induction that if g(y ) = bo # 0 then, for any n > 1, 


In(y) = n(x — y)7!)w + dn Ji(y) + dio + dl, 


where gp (t) is a polynomial of degree n — | and dy, dj,, d/’ are constants. On the other 
hand, if g(y ) = 0 then g/(y) = b; # 0 and, for any n > 1, 


In(y) =rn((Qx — y) "Jw t+ enlo +e, 


where r,(f) is a polynomial of degree n and e,, ej, are constants. 
Thus the evaluation of an arbitrary elliptic integral can be reduced to the evalua- 
tion of 


y= | dx/w, fi = [ saxju, AQ)= fe =1yax/w, 


where w* = g is a cubic without repeated roots, y € C and g(y) ¥ 0. Following 
Legendre (1793), to whom this reduction is due, integrals of these types are called 
respectively elliptic integrals of the first, second and third kinds. 

The cubic g can itself be simplified. If a is a root of g then, by replacing x by 
x — a, we may assume that g(0) = 0. If £ is now another root of g then, by replacing 
x by x/f, we may further assume that g(1) = 0. Thus the evaluation of an arbitrary 
elliptic integral may be reduced to one for which g has the form 


gi(x) = 4x(1 — x) — Ax), 


where 4 € C and 4 ¥ 0,1. This normal form, which was used by Riemann (1858) 
in lectures, is obtained from the normal form of Legendre by the change of variables 
x = sin’ 0. To draw attention to the difference, it is convenient to call it Riemann’s 
normal form. 
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The range of 4 can be further restricted by linear changes of variables. The trans- 
formation y = (1 — 4x)/(1 — 4) replaces Riemann’s normal form by one of the same 
type with A replaced by UA = 1 —2. Similarly, the transformation y = 1 — 2x replaces 
Riemann’s normal form by one of the same type with 4 replaced by VA = 1/(1 — A). 
The transformations U and V together generate a group Y of order 6 (isomorphic to 
the symmetric group -% of all permutations of three letters), since 


U>=V?=(UV)Y =I. 
The values of 2 corresponding to the elements J, V, V2,U, UV, UV? of G are 
A, A/A-aA)y GA-D/4, 1-A, A/GQ-1), 1/4. 
The region ¥ of the complex plane C defined by the inequalities 
JA-1, <1, 0<&1 <1/2, 


is a fundamental domain for the group Y; i.e., no point of Y is mapped to a different 
point of ¥ by an element of Y and each point of C is mapped to a point of .F or its 
boundary 0.F by some element of Y. Consequently the sets {G(F) : G € Y} forma 
tiling of C; i.e., 


C= Vas UaF¥), G(F)AG(F)=8 ifG,G ¢GYandGFG’. 
i= 


This is illustrated in Figure 1, where the set G(.7) is represented simply by the group 
element G and, in particular, Y is represented by /. It follows that in Riemann’s 
normal form we may suppose 2 € ¥ UdF. 

The changes of variable in the preceding reduction to Riemann’s normal form may 
be complex, even though the original integrand was real. It will now be shown that any 
real elliptic integral can be reduced by a real change of variables to one in Riemann’s 
normal form, where 0 < A < | and the independent variable is restricted to the interval 
O<x <i. 


Fig. 1. Fundamental domain for 4. 
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If g is a cubic or quartic with only real roots, this can be achieved by a linear frac- 
tional transformation, mapping roots of g to roots of g,. Appropriate transformations 
are listed in Tables | and 2. It should be noted that 4 is always a cross-ratio of the four 
roots of g in Table 2, and that 2 is always a cross-ratio of the three roots of g and the 
point ‘oo’ in Table 1. 


Table 1. Reduction to Riemann’s normal form, g a cubic with all roots real 


dx/g(x)'/? = dy/ug,(y)!/* 
g(x) = A(x — a1)(x — a2)(x — a3), where a, > a2 > a3; O jk = Aj — ak 


gay) =4yd—y)U—Ay), whereO<4<1, ye (0,1) 


1/2 , ; 
w= (a43)!/7/2, Ag = 473/013, 1— Aq = 242/043. 


A A Range Transformation Corresponding values 
+1 Ao X20, y= -a))/(x — a2) X = 00 y=1 
(cal 0 
-1 1-’q agexsay  =a43(% — a2)/a42(x — a3) ay 1 
a2 0 
+1 Ag a3 Sx S02 = —43)/a73 ag 1 
a3 0 
-l 1—A’g x5 43 = 043/(a, — x) a3 1 
—0o 0 


Table 2. Reduction to Riemann’s normal form, g a quartic with all roots real 
dx/g(x)'/? = dy/uga(y)'/? 
g(x) = A(x — a1) (x — a2) (x — a3)(x — a4), Where ay > a2 > a3 > O43 A jE = Aj — OK 
ga(y)=4y0 -— yd —Ay), whereO<A<1, ye(0,1) 


1/2 
w= (a43024)'/7/2, Ag = a23014/013024, 1 — Ag = 012034/013024. 


A A Range Transformation Corresponding values 
+1 do X >A, y=an4(x—44)/aj4x—a2) x =O y= aglayy 
ay 0 
-l a) ag<sx< ay = 013(X — a2)/a42(x — a3) ay 1 
a2 0 
+1 49 a3 SxS a7  =Ap4(x — a3)/a23(% — a4) a 1 
a3 0 
-1 1-4 agexsazg =a13(4—44)/a34(a1 — x) a3 1 
a4 0 
+1 do XS04 = 494(x — 01)/014(% — a2) a4 1 


— 00 024/014 
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Suppose now that g is a real cubic or quartic with a pair of conjugate complex 
roots. Then we can write 


g(x) = Q1Q2 = (ayx* + 2bix +.c1)(@ax? + 2box +.€-0), 


where the coefficients are real, ajc} — i > O and azc2 — b5 4 0, but a2 may be zero. 
Consider first the case where a2 ¢ 0 and b} = b2a\/a2. Then 


Oi =ay(x +bi/ay)y? +b}, Oo = a(x +b1/a1)? + B5, 
where 
bi = (aici — bY) /ai, by = (anc — B3)/ad. 
If we put y = (x + bj /a})’, then 
= 1/2 

Rw) =RiQ) + Ro)y”, 

where the rational functions R;, Ro are determined by the rational function R, and 
dx/g(x)'/* = +dy/2[y(ay + bi) @y +b]. 


Thus we are reduced to the case of a cubic with 3 distinct real roots. 

In the remaining cases there exist distinct real values 51, s2 of s such that the poly- 
nomial Q; + sQz is proportional to a perfect square. For Q; + s Q> is proportional to 
a perfect square if 


D(s) := (a, + sa2)(cy + sc¢2) — (by + sby)* = 0. 
We have D(0) = ajc} — bt > 0. If az = 0, then b2 4 0 and D(+00) = —oo. On the 
other hand, if a2 4 0, then D(—a,/az2) < 0, since b} 4 b2a;/a2, and D(s) has the 


sign of a2c2 — b5 for both large positive and large negative s. Thus the quadratic D(s) 
has distinct real roots s,, 52. Hence 


Qi +5102 = (art siar)(x +41)”, Q1 +922 = (ai + 82a2)(x + a2)’, 
where a; + sja2 #0 (j = 1, 2) and 
d, = (b} + 51b2)/(a1 +8142), dz = (b1 + 82b2)/(a1 + 8242). 
Consequently 
Oi = Ai(x +i)? + Bi(x +a2)*, Qo = A(x +d)” + Box +b)’, 
where 


A, = —s2(a1 + s1a2)/(s1 — 82), By = 51 (a, +5242) /(s1 — 52), 
Az = (a1 + 5142)/(s1 — 52), By = —(a1 + 52a2)/(s1 — 52). 
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If we put y = {(x + d))/(x + do)}’, then 
R(x) = Rily) + RaQ)y'”*, 


where again the rational functions Rj, R2 are determined by the rational function R, 
and 


dx/g(x)'/* = +dy/2|d> — dy|Ly(A1y + Bi)(A2y + Bo)]!”. 


Thus we are again reduced to the case of a cubic with 3 distinct real roots. 

The preceding argument may be applied also when g has only real roots, provided 
the factors Q; and Q2 are chosen so that their zeros do not interlace. Suppose (without 
loss of generality) that g = gj is in Riemann’s normal form and take 


Qi=(-x)d-Ax), Qo=4x. 
In this case we can write 


QO; ={0 + Varo — 1/022? — d= VAP FV AP 2/4, 
On = —V A(x — 1/7 AY? — (x + 1/VA)*}. 


If we put 
1-4V2y/(1+ VA)? = (@ = 1/VA)/(@ 4 1/VADY, 
we obtain 
dx/gi(x)'/? = dy/ugp)'”?, 
where 


w=ltVi, p=4Vi/(lt Vd». 


The usefulness of this change of variables will be seen in the next section. 


2 The Arithmetic-Geometric Mean 
Let a and b be positive real numbers, with a > b, and let 
aj =(a+b)/2,  b; = (ab)!/” 
be respectively their arithmetic and geometric means. Then 
aj <(a+a)/2=a, b, > (bb)'? =), 
and 


a, — by = (al? — b!/7)?/2 > 0. 
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Thus aj, b; satisfy the same hypotheses as a, b and the procedure can be repeated. If 
we define sequences {a,}, {by} inductively by 


aj=a, bo=b, 
Ont = Gnt+Bn)/2, batt = abn)? (@ =0,1,...), 
then 


0<bo <b < bo <--+ <a <a, <a. 


It follows that a, > 4 and by > wasn > oo, where 1 > w > 0. In fact A = pw, as 
one sees by letting n — oo in the relation aj4) = (dn + b,)/2. The convergence of 
the sequences {a,} and {b,} to their common limit is extremely rapid, since 


an — bn = (an 1— bn 1)? /8an4 1. 


(As an example, if a = J/2 and b = 1, calculation shows that a4 and b, differ by only 
one unit in the 20th decimal place.) 

The common limit of the sequences {a,} and {b,} will be denoted by M(a, b). 
The definition can be extended to arbitrary positive real numbers a, b by putting 


M(a,a)=a, M(b,a)=M(a,b). 


Following Gauss (1818), M(a, b) is known as the arithmetic-geometric mean of a 
and b. However, the preceding algorithm, which we will call the AGM algorithm, was 
first introduced by Lagrange (1784/5), who showed that it had a remarkable applica- 
tion to the numerical calculation of arbitrary elliptic integrals. The first tables of elliptic 
integrals, which made them as accessible as logarithms, were constructed in this way 
under the supervision of Legendre (1826). Today the algorithm can be used directly by 
electronic computers. 

By putting 1 — 2x = t*/a? in Riemann’s normal form, it may be seen that any real 
elliptic integral may be brought to the form 


/ p(t)[(a? — 17)? — b*) 7dr, 


where (ft) is a rational function of t? with real coefficients, a > b > O andt € [b, a]. 
We will restrict attention here to the complete elliptic integral 


i i " gla? — 22 — by -"at, 
b 


but at the cost of some complication the discussion may be extended to incomplete 
elliptic integrals (where the interval of integration is a proper subinterval of [b, a]). 
If we make the change of variables 


t? =a’sin?0+b?cos*O (0<0<2/2), 
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then 
tdt/d@ = (a* — b”) sinO cos6 = [(a? — t”)(t* — b”)]'/” 


and 
n/2 
J - | (a? sin? 6 + b? cos” 6)'/*)d0/(a? sin’ 6 + b? cos” 6)'/”, 
0 


Now put 
ty = (1/2)(t + ab/t) 
and, as before, 
aj =(a+b)/2,  b; = (ab)'”. 
Then 
a? — #7 = (a? — 7)? — b*)/40, 
1? — b* = (t? — ab)* /4t”, 


dt, /dt = (t? — ab)/20°. 


As t increases from b to b1, t} decreases from a, to bj, and as ¢ further increases from 
b, to a, t; increases from b; back to a,. Since 


t=t+(t? — bi), 


it follows from these observations that 


| * pla? — 2)¢2 — by -"2at = | " winlla? — 2)02 — by-2an, 


by 


where 
w(t) = (1/2)koL (A + f= BU) 1+ olla — Gf - bp T 
In particular, if we take g(t) = 1 and put 
HE (a,b) = iG — Py? — by ae, 
we obtain 
KE (a, b) = KH (a, bi). 


Hence, by repeating the process, .% (a, b) = # (an, bn). But 


m/2 
Han, bn) = f (a; sin’ 0 + by cos” 6)~'/7d0 
0 
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and 
by < (a2 sin? 0 + be cos” gy! /2 < dy. 
Consequently, by letting n — oo we obtain 


KH (a,b) = 2/2M(a, bd). 


Now take g(t) = a — 1? and put 


&(a, b) := y (a? = Pye — bdr. 
In this case 
w(t) = (a — b*)/2 + (aj — tf) 
and hence 
&(a, b) = (a” — b’).# (a, b)/2 + 2E (ay, b1). 
If we write 
en = 2"(a2 — 62) 


then, since #% (a, b) = # (an, bn), by repeating the process we obtain 


E (a, b)/H (a, b) = (eg ter +++ + €n—-1)/2 + 2"E (an, bn) /H (an, bn). 


But 
a /2 
2” E (an, bn) = en | cos” O(a? sin? 0 + b cos” 0)~!/*d@ 
0 


and e, — 0 (rapidly) as n > oo, since 
en = 2" (an-1 = bn—1)"/4 = €n—1(An—1 = bn—1)/4an. 
Hence 


E (a, b)/H (a,b) = (eo te, +e24+---)/2. 
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(1) 


(2) 


To avoid taking differences of nearly equal quantities, the constants e, may be calcu- 


lated by means of the recurrence relations 
—— 4/2 a (n= 1,2,...). 
Next take 
p(t) = pl(p* —a°)(p? — b*)N"/(p* — 1°), 


where either p > a or 0 < p < b, and put 


Pa, b, p):= [ vtce? = yp? = Py'Pary(p? = Pye - — 2)? — by}. 
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In this case 
w(t) = 41 + pil (py — af) (pj — OD)"; - th), 


where 


pi = (1/2)(p + ab/p), 
qi = (Pt — aq)! = [(p? — a*)(p? — b*)17/2p, 


and the + or — sign is taken according as p > a or0O < p < b. Since p; > ay in either 
event, without loss of generality we now assume that p > a. Then also p, < p and 


Pa, b, p) = qi (a,b) + Alar, bi, pi). 
Define the sequence {p,} inductively by 
PO= PP, Pati = (1/2)(Pn + Gnbn/ Pn) (n =0,1,...), 
and put 
dnt = (Pag — Angi)” = [Pn = an) (Pn = On)? /2Pn- 


Then py > v > M(a,b) asn > o, since dy < Pn < Pn—1. In factv = M(a, bd), 
as one sees by letting n — oo in the recurrence relation defining the sequence {p,}. 
Moreover 


On i= (a? — b?)/(p2 - @) > 0asn-> wo, 


since 


b2 


qr 


at OG 
On+1 = On 2 72 _ hd 
Aan) Pi 


= 2 2 
=) < On Pn /44n 41: 
n 


Hence 
(Pn — bn)/(Pa = @q) = 1+ bn > I 
Since P(dan, bn, Pn) 


gi? eG me [ __(ay sin” 8 + by cos? Oy 1*d0 
nia . 0 © (pz-a?) sin? 6 + (p2 — b?) cos? 6 
it follows that A(ay, bn, Pn) > «/2.asn — oo. Hence 
Pa,b, p)=(gitqat:-::)(a,b)+ 7/2. (3) 


To avoid taking differences of nearly equal quantities, the constants g,, may be calcu- 
lated by means of the recurrence relations 


Ont = On Pa/4ap4,41 +n). Gnt1 = (1+ dn)'?G2/2pn (n = 1,2,...). 
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Using (1)-(3), complete elliptic integrals of all three kinds can be calculated by 
the AGM algorithm. We now consider another application, the utility of which will be 
seen in §6. 

By putting t] = (1/2)(t + ab/t) again, one sees that 


[oe [oe 
fe -a@ye yar =a/y [ee ae - yr 'Pan. 
a a\ 
But the change of variables wu = a(1 — b*/t7)!/? shows that 
(oe) 
fl [2 = a2)? = PY V2dt = H (a, 0), 
a 
where c = (a* — b*)'/?. It follows that 


KH (a,c) = # (ay, c1)/2 = +++ = H (an, €n)/2”, 


where Cy = (a? — py 2 The asymptotic behaviour of % (ay, Cy) may be determined 
in the following way. 
If we put s = ac/t, then s decreases from a to c as ¢ increases from c to a, and 


ds/dt = —[(a? — s*)(s* — *?)'?/1(@ — P)(? — 7)". 


Since s = t when t = h := (ac)!/?, it follows that 


h 
KH (a,c) = 2 | fear Se) 7d, 
c 
But, force < t <A, 
bo) = (a2 — cy 1/2 < (a2 a 12)~1/2 < (a — py 1/? =a i= c/a)~'/?, 
Hence 
ob = Hae) = et —cfay “1, 


where 
h 
= / (0? — c*)'?dt = log{(a/c)'/? + (a/c — 1)'"7}. 


If we now replace a,b,c by an, bn, Cn then, since a,/cy — oo and moreover 
an, by, > M(a, b), we deduce that 
2" # (a, c)/log(4an/cn) > 1/M(a, b) = 24 (a, b)/z. 


But 4ay/cn = (4an/cen—1), since Cp = (dn—1 — bn—1)/2, and hence 


2” log(4an/cn) 
= 2!—" log(4an—1/cn—1) — 2!" log(an—1/an) 


= log(4ag/co) — log(ag/a1) — 27! log(ay /az) — «+» — 2!“ log(an—1/an). 
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It follows that 


mH (a, c)/2H (a, b) = log(4ai/co) — D2 log(an/an+1). 


n=1 
Finally, to determine & (a, c) we can use the relation 
HK (a, b)E(a, c) + H (a, )E(a, b) — a? H (a, b) XH (a,c) = a /2. 


By homogeneity we need only establish this relation for a = 1. Since 
1 
H(A, —aA)'/?) = [4x(1 — x)(1 — Ax)]71/7dx, 
0 


1 
B(1, (1 — ay") = | [( — Ax)/4x(1 — x)]"2ax, 
0 


(4) 


it is in fact equivalent to the following relation, due to Legendre, between the complete 


elliptic integrals of the first and second kinds: 


Proposition 1 /f 


1 1 
K(a= | [4x(1 — x)(1 — Ax)]7'/*dx, EQ) = | [1 — Ax)/4x(1 — x)]!/7dx, 
0 0 


then 
KQ)EQA-A+KQ0-AEQ)-KQ)KUA-A)=2/2 for0O<1 <1. 


Proof We show first that the derivative of the left side of (5) is zero. Evidently 
1 
dE/di = -«/2) | x[4x(1 — x)(1 — Ax)]7!/*dx = [E() — K(A)]/22. 
0) 
Similarly, 
1 
dK/di = aya | x(1 — Ax) [4x1 — x) — Ax) dx. 
0 
Substituting x = (1 — u)/(1 — Au) and writing 2’ = 1 — 1, we obtain 
1 
dK/di = ayaa’) [ [1 —u)/4u(1 — Au)]'/*du 
0 
= [E(A) —d/K(A)]/2a2". 
It follows that 
d(Ad'dK /dd)/dd = K/A. 


Thus yj (2) = K (A) is a solution of the second order linear differential equation 


d(Ad'dy/dd)/da — y/4=0. 


(5) 


(6) 
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By symmetry, y2(A) = K (1’) is also a solution. It follows that the ‘Wronskian’ 
W = Al'(yady,/dd — yidyy/da) 
has derivative zero and so is constant. But, writing 
K'(A)=K(1-A), E'V)=E-A), 
we have 
2W = K'(E—-4K)4+ K(E' —iK')=KE'+ K'(E—K). 


To evaluate this constant we let 2 > 0. Putting x = sin? 6, we obtain 


x /2 m/2 
K(d) -| (1 —Asin2@)~'/24a0, E(A) >) (1 — Asin? 6)!/2d0 
0 0 


and hence, as 1 > 0, 


K(4) > 2/2, EA) 9 2/2, EV’) > 1. 


Moreover 
K(A)[E() — K(4)] > 0, 
since 
K(A) — EQ) = af x[4x(1 — x)(1 — Ax)]7!/7dx = O(A) 
and 


a /2 
0< K()< | 1 —d—ar!2a0 = oc"), 
0 


It follows that2W = 7/2. 
If 2 = 1/2, then 2’ = 1 and (5) takes the simple form 


K (1/2)[2E(1/2) — K(1/2)] = x/2. 
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By the remarks preceding the statement of Proposition 1, the left side can be 
evaluated by the AGM algorithm. In this way z has recently been calculated to 
millions of decimal places. (It will be recalled that the value 2 = 1/2 occurred in 


the rectification of the lemniscate.) 


3 Elliptic Functions 


According to Jacobi, the theory of elliptic functions was conceived on 23 December 
1751, the day on which the Berlin Academy asked Euler to report on the Produzioni 
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Matematiche of Count Fagnano, a copy of which had been sent them by the author. 
The papers which aroused Euler’s interest had in fact already appeared in an obscure 
Italian journal between 1715 and 1720. Fagnano had shown first how a quadrant of a 
lemniscate could be halved, then how it could be divided algebraically into 2”, 3 - 2” 
or 5 - 2” equal parts. He had also established an algebraic relation between the length 
of an elliptic arc, the length of another suitably chosen arc and the length of a quadrant. 
By analysing and extending his arguments, Euler was led ultimately (1761) to a general 
addition theorem for elliptic integrals. An elegant proof of Euler’s theorem was given 
by Lagrange (1768/9), using differential equations. We follow this approach here. 
Let 


g(x) = 4x(1 —x)(1 — Ax) = 44x39 — 4(1 + Ax? 4.4 
be Riemann’s normal form and let 2 f; (x) be its derivative: 
fi(x) = 6Ax? — A(1 + A)x +2. 


By the fundamental existence and uniqueness theorem for ordinary differential equa- 
tions, the second order differential equation 


x" = fi(x) 7) 


has a unique solution S(t) = S(t, 4), defined (and holomorphic) for |t| sufficiently 
small, which satisfies the initial conditions 


S(0) = S’(0) = 0. (8) 
The solution S(t, 2) is an elementary function if 2 = 0 or 1: 
S(t,0) = sin*t, S(t, 1) = tanh’ ¢. 


(For other values of 2, S(t) coincides with the Jacobian elliptic function st.) 
Evidently S(t) is an even function of ft, since S(—f) is also a solution of (7) and 
satisfies the same initial conditions (8). 
For any solution x(t) of (7), the function x’(t)* — g,[x(t)] is a constant, since its 
derivative is zero. In particular, 


S(t)? = gil SI, (9) 


since both sides vanish for t = 0. 
If |c| is sufficiently small, then x(t) = S(t+7) and x2(t) = S(t—T) are solutions 
of (7) near t = 0. Moreover, 


a0) =—mig@l G=1,2), 
since these relations hold for t = 0. From 
(x1%5 + x}x2) = x1 fa (x2) + x2 F401) + 2x45 
and 


(x1x5 + x}x2)? = x294 (x2) + x292(x1) + 2x1x0x) x5 
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we obtain 
2x1 x2(%1x4 + Xia) - (x1x5 + aan) - 2%1X2X 1X5 
= x} (2x2 fi(2) — ga (2)} +243 {2x1 far) — g201)}. 

But if g,(x) = ax3 + Bx? + yx and fi(x) = g/,(x)/2, then 

2x fix) — g(x) = x°Qax + f). 
Hence 

Qxyx2 (xix +x} x2)! — (xix) + x4x2)? = 2xPxFfa(xr + x2) + B} + 2x1x2x} 25. 

On the other hand, 


(xy — x5) Crp xh + x4x2) = x28, (01) — x194(%2) + (x1 — x2) x5 
= xyx2(xy — x2){a (x1 +02) + B} + Or — x2)xp x9. 
Comparing these two relations, we obtain 
{2x1 x2(x1 x4 + x}x2)! — (Cxix5 + x4x2)"}(x1 — x2) = 2xyx2(x} — x5) (x1x5 + .X}x2). 
If we divide by 2x1x2(x1 — x2)(x1x45 + x} x2), this takes the form 


/ — 


(X1%, FH, x0) FIR, aD kh 


ihn Are HD 2x1x2 xy — x9” 
which can be integrated to give 

(xix), + x}x2)? = C(t) x1x2(x1 — x2)*, 
where the constant C(z) depends on rt. Equivalently, 


[S(u)S'(v) — S’(u) SQ)P = Cu + 0)/2)S(u)SCO)LS@) — SOP. 


To evaluate the constant, we divide throughout by S(v) and let v > 0. By (9), this 
yields C(u/2) = y /S(u). Since y = 4 (for Riemann’s normal form), we obtain finally 


S(u +0) = 48(u)S(0)[S(u) — S)P/LS@)S'(v) — SW) SO)P. (10) 


Thus S(u + v) is a rational function of S(u), S(v), S’(u), S’(v). Moreover, since 
(S’)? = g,(S), there exists a polynomial p(x, y, z), not identically zero and with coef- 
ficients independent of u and v, such that p[S(u+v), S(u), S(v)] = 0. In other words, 
the function S(u) has an algebraic addition theorem. 

The relation (10) can also be written in the form 


S(u + v) = [S(u)S'(v) + S'(u) S(v) 7 /4S(u)S(v) [1 — AS(U)S(0) PP, A) 
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since 


S(u)’S' (vy? — S'(uy’ Sv)? = Su)? g[S(v)] — Sv) gilS@)] 
= 4S(u)S(v)[S(u) — S(v) [1 — AS(u)S(v)]. 


Replacing v by —o in (11) and subtracting the result from (11), we obtain 
S(u + v) — S(u—v) = S’(u)S'(v)/[1 — AS(u) S(v) P. (12) 
In particular, for o = u, 
S(Qu) = galS(u)I/[1 — 28° (uP. (13) 


We recall that a function is meromorphic in a connected open set D if it is holomor- 
phic throughout D, except for isolated singularities which are poles. Since, by (13), 
S(2t) is a rational function of S(t), it follows that if S(t) is meromorphic and a 
solution (wherever it is finite) of the differential equation (7) in an open disc |t| < R, 
then its definition can be extended so that it is meromorphic and a solution (wherever it 
is finite) of the differential equation (7) also in the disc |t| < 2R. But the fundamental 
existence and uniqueness theorem guarantees that S(t) is holomorphic in a neighbour- 
hood of the origin. Consequently we can extend its definition so that it is meromorphic 
and a solution of (7) in the whole complex plane C. 

Further properties of the function S(t) may be derived from the differential equa- 
tion (7). For any constants a, f, if y(t) = aS(ft), then y(0) = y’(0) = 0. It is readily 
seen that y(t) satisfies a differential equation of the form (7) if and only if either a = 1, 
B =+lora = 2, Ap? = 1, and in the latter case with 1 replaced by 1/2 in (7). It 
follows that, for any 2 4 0, 


S(t, 1/a) = AS(A7"/71, a). (14) 


By differentiation it may be shown also that S(it, 4)/[S(it, 4)—1], where i = —1, 
is a solution of the differential equation (7) with 2 replaced by | — J. It follows that 


S(t, 1 — 4) = Sit, 4)/[S(it, 2) — 1). (15) 
By combining (14) and (15) we obtain, for any 1 # 0, 1, three more relations: 


S(t, 1/(.— 4) = (1 —a)sGd — ay“/t, /ISEG — 47", 4)- 11, (16) 
S@,Q =1/A) = 1800-41, )/DSGA "4, 4) = 1), (17) 
S(,4/Q —1)) = (1-4 S(0 = 47 "2, B/T — aS = aye, a. 8) 
As in 81, it follows from (14)—-(18) that the evaluation of S(t, 2) for all t, 2 € C 
reduces to its evaluation for 4 in the region |2 — 1| < 1,0 < &A < 1/2. Similarly 
it follows from (14) and (18) that the evaluation of S(t, 2) for all t, 2 € R reduces to 
its evaluation for A in the interval 0 < 2 < 1. We now show that S(t, 2) can then be 


calculated by the AGM algorithm. 
It is easily verified that if 


z(t) = (1+ VA)?S(t, A)/[1 + VAS(t, AY’, 
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then 
(dz/dt)? = (1+ V2)"{4doz9 — 4(1 + Ao)? + 4z}, 
where 
do =4V7/(1 4 VA). (19) 
Since z(0) = z’(0) = 0 and z’(0) ¥ 0, it follows that z(t) = S((1 + VA), Ao). Thus 
S(1+V2)t, 49) = + V4)’ S(t, 2/1 + VAS, DP. (20) 


The inequality 0 < 2 < limplies’2 < 4p < 1. Hence, by regarding (19) as a quadratic 
equation for /2, we obtain 


Vi = [b= (1 = 20)! 7/0. (21) 
If we write Vo = co/ao, where co = (a5 - b5)'/? and 0 < bo < ag, then 
V2. = (ao — bo)/ (ao + bo) = c1/a1, 
where 
a1 = (ao + bo)/2, bt = (aobo)'/*, er = (az — Bi)". 
Since 1 + /2 = ag /a1, we can rewrite (20) in the form 
S(aot, Ag) = (1 + c1/at)?S(art, A1)/U + (c1/a1)S(art, a1), 
where 4; = A = (c)/a,)?. Repeating the process, we obtain 
S(Qn—1t, An-1) = (1+ €n/An)”SCGnts An)/L1 + (n/n) Sant, An)Vs 
where 1, = (Cy /din Asn —> oo, 
an > “= M(a,b), cr 0, An > 0. 


Since S(t, 0) = sin’ t, for some (not very large) n = N we have S(ayt, Ay) © sin? ut, 
which may be considered as known. Then, by taking successivelyn = N, N—1,...,1 
we can calculate S(aot, 49). Moreover, we can start the process by taking a9 = 1, 
bo = (1 — Ao)"”. 

We now consider periodicity properties. If 2 4 1 and S(h) = 1 for some nonzero 
h e€ C then, by (13), S$(2h) = 0. Furthermore S’(2h) = 0, by (9). It follows that 
S(t) has period 2h, since S(t + 2h) is a solution of the differential equation (7) which 
satisfies the same initial conditions (8) as S(t). It remains to show that there exists such 
anh. 

Suppose first that 2 € R andO < 4 < 1. Since S”(0) = 2, we have S’(t) > 0 for 
small t > 0. If S’(t) > Ofor0 <t < T, then S(t) is a positive increasing function for 
0 <t <T. Since gj,[S(t)] > 0, we must also have S(t) < 1 forO <t < T. From the 
relation 


a 1/2 
r= | dx/gj(x) / : 
0 
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it follows that T < K (A), where 


1 
K(A) a dx/g,(x)'/. 


Hence S’(t) vanishes for some tf such that 0 < t < K (A) and we can now take T to be 
the least t > 0 for which S’(t) = 0. Then S’(T) = 0, S(T) = 1 and by letting t > T 
we obtain T = K(A). 

This shows that $(u) maps the interval [0, K (A)] bijectively onto [0, 1], and if 


u(é) -[ ieieitel? Cae vy, 
0 


then S[u(€)] = €. Thus, in the real domain, the elliptic integral of the first kind is 
inverted by the function S(w). 

Since 2 4 1, it follows that S(t) = S(t, 2) has period 2K (A). Since 2 # 0, it 
follows from (15) that S(t, 2) also has period 2i K (1 — 2). Thus S(t, A) is a doubly- 
periodic function, with a real period and a pure imaginary period. We will show that 
all periods are given by 


2mK(A)+2niKA—2) (m,neZ). 


The periods of a nonconstant meromorphic function f form a discrete additive 
subgroup of C. If f has two periods whose ratio is not real then, by the simple case 
n = 2 of Proposition VIII.7, it has periods @ , 2 such that all periods are given by 


mo, +na, (m,néeZ). 


In the present case we can take w) = 2K (A), @2 = 21K (1 — 4) since, by construction, 
2K (A) is the least positive period. 

Suppose next that 2 € R andeither 2 > 1 or 1 < 0. Then, by (14) and (15), S(f, 2) 
is again a doubly-periodic function with a real period and a pure imaginary period. 

Suppose finally that 2 € C\R. Without loss of generality, we assume %1 > 0. 
Then gj(z) does not vanish in the upper half-plane #. It follows that there exists a 
unique function h,(z), holomorphic for z ¢ # with Zhj(z) > O for z near 0, such 
that 


hy(z)? = ga(z). (22) 


Moreover, we may extend the definition so that 4, (z) is continuous and (22) continues 
to hold forz € HW UR. 
We can write S(t) = y(t*), where 


y(w) = w+aqw? +--- 
is holomorphic at the origin. By inversion of series, there exists a function 
$@) =zt+ bor +---, 


which is holomorphic at the origin, such that y[#(z)] = z. For z € # near 0, put 
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u(z) = $(z)!”, 


where the square root is chosen so that Zu(z) > 0. Then S[u(z)] = z. Differentiating 
and then squaring, we obtain 


S'[u(zylu'(@) = 1, u'@)’ = 1/gi@). 
But u’'(z) also has positive real part, since S’[u(z)] ~ 2u(z) for z > 0. Consequently 


u'(z) = 1/hj(z). Since u(z) > 0 as z > 0, we conclude that 


gos J “de/hilO), (23) 


where the path of integration is (say) a straight line segment. However, the function on 
the right is holomorphic for all z € #. Consequently, if we define u(z) by (23) then, 
by analytic continuation, the relation S[u(z)] = z continues to hold for all z € #. 
Letting z > 1, we now obtain S(h) = 1 forh = K (A), where 


1 
K():= ’ dx/ex(x)!”? 


and the square root is chosen so that g,(x)!/* is continuous and has positive real part 
for small x > 0 and actually, as we will see in a moment, forO0 < x < 1. Hence S(t) 
has period 2K (A). Furthermore, by (15), S(t) also has period 2i K (1 — 4). 

For 0 < x < | we have 


1/gi(x)/? = (1 — Ax)/?/[4x(1 — x)]/7]1 - Axl. 


If 4 = w +iv, where v > 0, then 1 — Ax = y +16, where y = 1— wx andd = vx > 0 
for0 < x < 1. Hence 


(i —Ax)!? =a +i£, 
where 
a={yt(y7 +07 7/2, 2af =6, 


first for small x > O and then, by continuity, for 0 < x < 1. Thus @ and £ are positive 
for 0 <x < 1. Consequently Zg,(x)!/? > 0 for0 <x < land 


K(A) =A+iB, 


where A > 0, B > 0. 
Similarly, for 0 < y < 1 we have 


1/gi-aQy)'/? = A- -Ay)'/7/[4y = yA - = ay 


and 1—(1—/)y = y’—io’, where y’ = 1—(1—y) y ands! = vy > Ofor0 < y <1. 
Hence 


(1-(1—A)y)!? =a! — if’, 
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where 
al = fy +? + SNP //2, 2a! fl = 0. 
Thus a’ and f’ are positive for 0 < y < 1, and 
K(1—d)=A'— iB’, 


where A’ > 0, B’ > 0. 

We will now show that the period ratio i K (1 — 4)/K (A) is not real by showing that 
the quotient K (1 — 4)/K (A) has positive real part. Since this is equivalent to showing 
that 


AA’ — BB’ > 0, 


it is sufficient to show that aa’ — Bf’ > 0 for all x, y € (0, 1). The inequality is cer- 
tainly satisfied for all x, y near 0, since a > 1, 8 > Oasx > Oanda’ > 1, fp’ > 0 
as y — 0. Thus we need only show that we never have aa’ = £6’. But 


2a? = (9? + HYP ty, 2B = (y? +017 —y, 


with analogous expressions for 2a”, 28”. Hence, if aa’ = Bf’, then by squaring we 
obtain 


(ey +)? + yy? $07)? 4 T= 1? $0)? — py? +87)? — y'1, 
which reduces to 
y (v2 4.62)! _ —y' (p24 2)", 


Squaring again, we obtain y7’? = y’*6?. Since the previous equation shows that 


and y’ do not have the same sign, it follows that 
yi +y'5=0. 
Giving y, 6, y’, 0’ their explicit expressions, this takes the form v(x + y — xy) = 0. 
Hence x(1 — y) + y = 0, which is impossible if 0 < y < l andx > 0. 
The relation S[u(z)] = z, where u(z) is defined by (23), shows that the elliptic 


integral of the first kind is inverted by the elliptic function S(u). We may use this to 
simplify other elliptic integrals. The change of variables x = S(u) replaces the integral 


J Reoax/e:09'” 
by I R[S(u)]du. Following Jacobi, we take 


E(u) := fu — AS(v)]do (24) 
0 
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as the standard elliptic integral of the second kind, and 
u 
IT(u, a) := a2) | S'(a)S(v)dv/[1 — AS(a)S(v)] (25) 
0 


as the standard elliptic integral of the third kind. 
Many properties of these functions may be obtained by integration from corre- 
sponding properties of the function S(u). By way of example, we show that 


E(u +a) — E(u—a) —2E(a) = —AS'(a)S(u)/f1 — AS(a)S(u)]. (26) 


Indeed it is evident that both sides vanish when u = 0, and it follows from (12) that 
they have the same derivative with respect to u. Integrating (26) with respect to u, we 
further obtain 


u+a 
IT(u, a) = uE(a) — aya) | E(v)dv. (27) 


Thus the function //(u, a), which depends on two variables (as well as the parame- 
ter 1) can be expressed in terms of functions of only one variable. Furthermore, we 
have the interchange property (due, in other notation, to Legendre) 


IT(u,a) —uE(a) = IT (a,u) —aE(u). (28) 
If we take u = 2K = 2K (A), then S’(u) = O and hence //(a, u) = 0. Thus 
I(2K,a) =2K E(a) —aE(2K), (29) 


which shows that the complete elliptic integral of the third kind can be expressed in 
terms of complete and incomplete elliptic integrals of the first and second kinds. 

In order to justify taking /7(u, a) as the standard elliptic integral of the third kind, 
we show finally that S(a) takes all complex values. Otherwise, if S(u) ¢ c for all 
u € C, thenc 4 0 and 


f(u) = S(u)/[Su) — ¢] 


is holomorphic in the whole complex plane. Furthermore, it is doubly-periodic with 
two periods @1, w2 whose ratio is not real. Since it is bounded in the parallelogram with 
vertices 0, @1, @2, © + @2, it follows that it is bounded in C. Hence, by Liouville’s 
theorem, f is a constant. Since S is not constant and c ¥ 0, this is a contradiction. 


4 Theta Functions 


Theta functions arise not only in connection with elliptic functions (as we will see), 
but also in problems of heat conduction, statistical mechanics and number theory. 
Consider the bi-infinite series 


ioe) ioe) fore) 
> g” 2" os fi irr fe be a ae 
n=1 


n=—0o n=1 
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where g, z € C and z ¥ 0. Both series on the right converge if |g| < 1, both diverge 
if |g| > 1, and at most one converges if |g| = 1. Thus we now assume |g| < |. 

A remarkable representation for the series on the left was given by Jacobi (1829), 
in $64 of his Fundamenta Nova, and is now generally known as Jacobi’s triple product 
formula: 


Proposition 2 [f|q| < 1 andz #0, then 


[oe] [oe] 
> gre = [Ja $9" ZU qe 2 ha ag). (30) 
n=—CO n=1 
Proof Put 
N 
fue) =[[a+q"'2d +g" 1271). 
n=1 
Then we can write 
fu( =e teh (tz) +e eH (eN 427%). (31) 


To determine the coefficients c’ we use the functional relation 


fu(q?z) = (1+ Nt 2) + 7! 27!) fv (2/0 +2) + nt) 
= (14+ 9q°%*"z) fu(z)/(qz+ 97%). 


Multiplying both sides by gz + q?% and equating coefficients of z’+! we get, 
forn =0,1,...,N—1, 


2n+1.N 2N+2n+2 .N 
que +4 c 


_ AN 2N+1 .N 
n+l — Ch4+t + q Ch > 


Le., 


gtd _ ge =(1- gree e 


But, since ee (2n—1) = N?, it follows from the definition of fy (z) that a = qh. 
Hence, for0 <n < N, 
cN = (1 _ gid _ gas ae (1 _ gq ya 1D, 
where D = (1 — q*)(1—q*)---(1—q?-*). 
If |g| < 1 and z ¥ 0, then the infinite products 


[Ja Age a), [][a Agee), [Ja _ oo) 


n=1 n=1 n=1 


are all convergent. From the convergence of the last it follows that, for each fixed n, 


[o.@) 
i N _ on 1—q2. 
slim cn = 4 / 1 q*) 
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Moreover, there exists a constant A > 0, depending on q but not on n or N, such that 
N 2 
leon |< Algl”’. 


For we can choose B > 0 so that | [[ZL, (1 — q°*)| > B for all m, we can choose 
C > 0so that | [7,0 - q°*)| < C for all m, and we can then take A = C/B?. Since 


; De ; 
the series }°°° _., q" z” is absolutely convergent, it follows that we can proceed to 


the limit term by term in (31) to obtain (30). 


: 2 
In the series 30°24, q” z” we now put 


iT 27 iv 
q=e > Ze a 


so that |g| < 1 corresponds to .%7 > 0, and we define the theta function 


co 


sage ‘ 
O(v; tT) = > etitn® o2aivn 
n=—CO 
The function 0(v; 7) is holomorphic in v and ¢ for allv € Candt € # (the upper 


half-plane). Since initially we will be more interested in the dependence on v, with t 
just a parameter, we will often write @(v) in place of @(v; 7). Furthermore, we will still 
use g as an abbreviation for e”’". 


Evidently 
6(v + 1) = O(v) = A(-v). 
Moreover, 
= 2 
A(v a t) = py ri +2n o2aivn 
n=—oco 
(oe) 
= quien 2niv > g int 2riv(ntl) 


n>=—Oo 


= e 720+) 9 (py), 
It may be immediately verified that 
0°0/dv? = —427q00/dq = 42 i00/dt, 


which becomes the partial differential equation of heat conduction in one dimension 
on putting t = 4zit. 
By Proposition 2, we have also the product representation 


co 


O(v) = [[a or amr aan 6 ap genet ey _ ay 


n=l 
It follows that the points 
v=1/24+7/2+m+nt (m,néZ) 


are simple zeros of 9(v), and that these are the only zeros. 
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One important property of the theta function is almost already known to us: 


Proposition 3 Forallv € Candt € #, 
O(v; —1/t) = (t/i)"/2e7'?’ (rv; 7), (32) 
where the square root is chosen to have positive real part. 


Proof Suppose first that t = iy, where y > 0. We wish to show that 


(oe) 


love) 
—n2n/y i = 2 
> ew m/y 2naiv = yo > e (v-+n) my 


n=—OoO n=—Oo 


But this was already proved in Proposition IX.10. 

Thus (32) holds when rt is pure imaginary. Since, with the stated choice of square 
root, both sides of (32) are holomorphic functions for v € C and t € #,, the relation 
continues to hold throughout this extended domain, by analytic continuation. 


Following Hermite (1858), for any integers a, / we now put 
[0.0] 
Ou pv) =O, pv; t) = pe (—1)hrerit(nta/2)? p2xiv(nta/2) 
n=—OoO 
(The factor (—1)4” may be made less conspicuous by writing it as e7’4”.) Since 
Oa+2,p(0) = (—1)"Oa,6(0),  9a,p-42(0) = Oa,p(0), 


there are only four essentially distinct functions, namely 


co 


Ao9(v) = > etitn -2xivn 
n=—0Oo 
ioe) 
Oo1(v) = > (—1)tetitn e2zivn 
n=—0o 
: (33) 
O19(v) > >y etit(n+l/2) pxivQn+l) | 
n=—0Oo 
ie) 
O11(v) = > (—1)"etit 41/2) priv Qn+1) 
n=—0Oo 


Moreover, 


Oo(v; T) =O(v;T), OA1(v; tT) = A(v + 1/2; 7), 
Oio(v; tT) =e OFM Oy 4.2/2; 7),  O(o; 7) = eT! O(0 + 1/2 +. 7/2; 7). 


In fact, for all integers m, n, 


6q,p(> + mt /2+n/2) = Ga+m,pin (vet iemotin’t/4—an/2)_ (34) 
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Since the zeros of @(v; 7) are the points v = 1/2+1/2+ mrt +n, the zeros of 04,4 (v) 
are the points 


v=(P41/24+ (a+ ))rt/24+mt4+n (m,neZ). 


The notation for theta functions is by no means standardized. Hermite’s notation 
reflects the underlying symmetry, but for purposes of comparison we indicate its 
connection with the more commonly used notation in Whittaker and Watson [29]: 


Ooo(0; T) = V3(x0,q), Oo1(0; tT) = Va(z0, gq), 
Aio(0; T) = V2(m0,q), A1(0; tT) =iVi (a0, q). 

It follows from the definitions that O99(v; 7), Ao1(v; 7) and 69(v; T) are even func- 
tions of v, whereas 61; (v0; 7) is an odd function of v. Moreover O9(v; 7) and 41 (v; T) 
are periodic with period | in v, but @j9(v; 7) and 61;(v; 7) change sign when v is 
increased by 1. 


All four theta functions satisfy the same partial differential equation as 0(v; 7). 
From the product expansion of @(v; t) we obtain the product expansions 


oo 
Ao0(v) = Qo [Ja a gen lerivy(y 4 gente 2mivy, 

n=1 

oo 
M1(0) = Qo [[c = grt !e2rivy( _ g2tle-dnivy 

n=1 . Be 
O10(v) = 2Qoe"'*/* cos xv [Ja ge ge), 


n=1 


00 
6141(v) = 2i Ooe™it/4 sin zo [[a = gre vl _ ge), 


n=1 


where q = e™'T and 
[oe] 
Qo = [][a-¢””). 
n=1 


In particular, 


co 


A000) = Qo [ [a+q7""'y’, 


n=1 


61(0) = Qo | Ja-4""')’, 


n=1 


(oe) 
010(0) = 2q'/* 00 [[a+4""y- 


n=1 
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By differentiating with respect to v and then putting o = 0, we obtain in addition 
0 (0) = 2ziq!/*4Q3. But 


Qo= | ]G-4")+q") 


n=1 


=[Ja-@™a-a"")a4+q")+q7"""), 


n=1 


which implies 
[o-e) 
[[a ag? YG +¢")1 ge) I, 
n=1 


It follows that 
4o0(0)4o1 (0)A10(0) = 24'/*Q5 
and hence 
8; ; (0) = iA 0(0)401 (0)410(0). (36) 


It is evident from their series definitions that, when g is replaced by —q, the func- 
tions O99 and 6; are interchanged, whereas the functions gg" 4419 and gt 4011 are 
unaltered. Hence 


Ooo(v; t +1) =Oo1(0; t),  ro(v; t +1) = e* 4 O10(0; 7), 


wi (37) 
Aoi(v; t +1) = Ao0(v; t),  11(v5 7 + 1) = e*"/7811(05 7). 
From Proposition 3 we obtain also the transformation formulas 
Ooo(0; —1/t) = (2/i)'/7e**° Goo (t0; 7), 
ee 
Aigo; —1/t) = (2/i)'7e™*” Opi (to; 7), a 


Oo1(v; —1/t) = (t/i)!/2e"!”’ Oo(z0; 7), 
611(v; —1/t) = —i(t/i)/2e7!*”’ 1 (rv; 7). 


Up to this point we have used Hermite’s notation just to dress up old results in new 
clothes. The next result breaks fresh ground. 


Proposition 4 Forallv,w €¢ Candteé #, 


Aoo0(v; T)Ao0(w; T) = Ooo(v + w; 27)Ao0(0 — w; 27) + Aio(o + w; 27)Aio(v — w; 27), 
Aio(v; T)A10(w; T) = Aio(v + w; 27)Ao0(0 — w; 27) + Aoo(v + w; 27)Aio(v — w; 27), 
Ooo(v; T)Ao1(w; T) = O10 + w; 27)Oo1(0 — w; 27) + A11(0 + w; 27)A11(0 — w; 27), 
901 (0; T)Ao1(w; T) = Ooo(v + w; 27)Aoo(v — w; 27) — Ao(0 + w; 27)Ojo(v — w; 27), 
Ao; T)A11(w; T) = A (v0 + w; 27)Oo1 (v0 — w; 27) — Aoi (v + w; 27)A11 (v0 — w; 27), 
O11 (0; T)A11(w; T) = Ajo(v + w; 27)Oo0(v — w; 27) — Aoo(v + w; 27)O10(v — w; 27). 
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Proof From the definition of O00, 
+722) 22 er 
Oo0(v; )Oo9(w; T) = ie Ph Tew ie gem tok — 2 + 2 . 
Jk jtk even j+kodd 


In the first sum on the right we can write j + k = 2m, j —k = 2n. Then j = m+n, 
k =m-—nand 


> = > e2tit(m +n?) 2xi(vt+w)m ,2ai(v—w)n 
J+k even m,neZ 
= Ooo(v + w; 27)Oo0(v — w; 27). 
In the second sum we can write j +k = 2m+1, j-—k =2n+1.Thenj =m+n-+1, 
k =m-—nand 
> = >. e2tit((m+1/2)°+(n41/2))} ,2aiv(mtn+1) 42x iw(m—n) 
jt+k odd = m,neZ 

= O1o(v + w; 27)O1o(v — w; 27). 

Adding, we obtain the first relation of the proposition. 
We obtain the second relation from the first by replacing v by v + t/2 and w by 


w+t/2. The remaining relations are obtained from the first two by increasing v and/or 
w by 1/2. 


By taking w = v in Proposition 4, and adding or subtracting pairs of equations 
whose right sides differ only in one sign, we obtain the duplication formulas: 


Proposition 5 Forallv ¢ Candt € #, 
Ooo (20; 27) = [A§q(0; 7) + 45, (0; t)]/2O00(0; 27) 
= [8fg(0; t) — OF, (0; t)]/2610(0; 27), 
O10(20; 27) = [go(v; 7) — O51 (0; 7)1/2010(0; 27) 
= [Ojo(0; t) + 47, (0; 7)1/200(0; 27), 
401 (20; 27) = Aoo(v; 7)Ao1(v; t)/O01 (0; 27), 
O11 (20; 27) = A10(v; 7)A11(v; 7) /O01 (0; 27). 


From Proposition 4 we can also derive the following addition formulas: 


[ 
[ 


Proposition 6 Forallv,w € Candt € #, 
05 (0)Oo1(v + w)Oo1(v — w) 
= 05, (v)0), (w) — 87, (v)O7, (w) = $9 (v) O50 (w) — OF (v)OFq(w), 
A90(0) 001 (0) A00(v + w) 1 (v0 — w) 
= 600(v) O01 (0) Ao0(w)Ao1 (w) + A10(0)A11 (0)A10(w)A11(w), 
401(0)A10(0)Ai0(o + w)Ao1(0 — w) 
= 001 (0)A10(0) 401 (W)O10(w) + Ao0(0) 411 (v)O00(w)A11(w), 
Ao0(0)A10(0)A11(0 + wW)A1(0 — w) 
= 0o1(0)11 (0) O00(wW)410(w) + Ao0(0)10(v) O01 (wW)A11(w), 


where all theta functions have the same second argument tT. 
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Proof Consider the second relation. If we use the first and fourth relations of Propo- 
sition 4 to evaluate the products 499(v)@99(w) and 41 (v) O91 (w), we obtain 


490 (v) 901 (v)A0(w) O01 (w) = Oo (v + w; 27)OG(v — w; 27) 
- Oo(v + w; 21)O7(v —w;2t). 


Similarly, if we use the second and sixth relations of Proposition 4 to evaluate the 
products 619(v)@19(w) and 6); (v)611(w), we obtain 


919(v)O11 (0) A10(w)A11 (w) = Oig(v + w; 27)OG(v — w; 27) 
- Ojo (v + w; 21)O7(v —w;2t). 


Hence, in the second relation of the present proposition the right side is equal to 
[O59 (v +w;2t)+ CAG) + w; 27) [Oo —w;2t)- CAG) —w;2rt)]. 


On the other hand, if we use the first and fourth relations of Proposition 4 to evaluate 
the products 499(0)@o0(v + w) and 4; (0)401 (v — w), we see that the left side is likewise 
equal to 


[O59 (0 +w;2t)+ O7(v +w; 21) MOGo(v —w;2t)- O7(v —w;2t)]. 


This proves the second relation of the proposition, and the others may be proved 
similarly. 


Corollary 7 Forallv € Candt € #, 


A (0)O51 () + A{o(0)OF (0) = O51 O)OGo 0), (39) 
07..(0)05, (0) + Aq (0)97, (0) = 65, (0)0;o(0). (40) 


Moreover, for allt € #, 
6 (0) = 01 (0) + Ai9(0). (41) 


Proof We get (39) and (40) from the first relation of Proposition 6 by taking w = 1/2 
and w = (1 + t)/2 respectively. We obtain (41) from (39) by taking v = 1/2. 


If we regard (39) and (40) as a system of simultaneous linear equations for the 
unknowns ae (v), ore (v), then the determinant of this system is O59 (0) - 64,(0) = 
OH, (0) # O. It follows that the square of any theta function may be expressed as a 
linear combination of the squares of any other two theta functions. 

By substituting for the theta functions their expansions as infinite products, the 
formula (41) may be given the following remarkable form: 


[o.@) [o.@) [o.@) 
[[c gh — [[a ake 16q | [a eT ais 
n=1 


n=1 n=1 
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Proposition 8 Forallv € Candt € #, 


{O0(v)/Oo1(v)}" = 2iPFy(0)O10(v)O11 (v0) /95; (0), (42) 
{A10(v)/Oo1(v)}/ = 110} (0)Oo0(v)O11 (0) /95; (0), (43) 
{A11(v)/Oo1(v)}/ = 21}, (0)O00(v)O10(v) /95; (0), (44) 


{06 (0) /Oo1 (v0) }" = 86; (0) /0o1 (0) + 2759 (0)07(0)97; (v)/O9,(0). (45) 


Proof By differentiating the second relation of Proposition 6 with respect to w and 
then putting w = 0, we obtain 


80 (0)401 (0) [499 (0) O01 (0) — Ao0(0)O5, (0)] = 10(0)9; , (0) A10(0) 11 (0), 


since not only 6;;(0) = 0 but also 64,(0) = 65, (0) = 4;)(0) = 0. Dividing by A (v) 
and recalling the expression (36) for 0; (0), we obtain (42). Similarly, from the third 
and fourth relations of Proposition 6 we obtain (43) and (44). 

In the same way, if we differentiate the first relation of Proposition 6 twice with 
respect to w and then put w = 0, we obtain 


65 (O)19G,, (0) O01 (0) — 9%; (&)71 = 401 (0) 9G, (0) 95, (0) — A; , 0)°67, (0). 


Hence, using (36) again, we obtain (45). 


We are now in a position to make the connection between theta functions and 
elliptic functions. 
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The behaviour of the theta functions when their argument is increased by | or t makes 
it clear that doubly-periodic functions may be constructed from their quotients. We put 


snu = sn(u; T) := —i990(0)411 (0) /A10(0)Ao1 (v), 
cnu = cn (u; T) := 691 (0)O10(v)/A10(0)401(v), (46) 
dnu = dn (u; tT) := 891 (0)600(v) /o0(0)Ao1(v), 


where u = 7 O59 (0)0. 
The constant multiples are chosen so that, in addition to snO = 0, we have 
cnQ = dnO = 1. The independent variable is scaled so that, by (42)-(44), 


d(snu)/du = cnudnu, 
d(cnu)/du = —snu dnu, (47) 
d(dnu)/du = —Asnucnu, 
where 
A = A(t) = Big (0; 7) /05g(0; 7). (48) 


It follows at once from the definitions that sn uv is an odd function of u, whereas 
cn u and dn uw are even functions of u. It follows from (41) that 
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1 = A(t) = 610; 1)/6}0; 7), (49) 
and from (39)—(40) that 
cn’u =1— sn-u, dn?u = 1 — Asn7u. (50) 
Evidently (47) implies 


d(sn?u)/du = 2snucnudnu, 


d?(sn?u)/du* = 2(en?u dn?u — sn?u dn?u — Asn?ucnu). 
If we write S(u) = S(u; t) := sn?u and use (50), we can rewrite this in the form 


d’S/du? = 2[(1— S)(1 — AS) — S(1 — AS) — AS(1— S)] 
= 64S? —4(1+ A)S +2. 


Since S(O) = S’(0) = 0, we conclude that S(u) coincides with the function denoted 
by the same symbol in 83. However, it should be noted that now J is not given, but is 
determined by r. Thus the question arises: can we choose t € # (the upper half- 
plane) so that A(z) is any prescribed complex number other than 0 or 1? 

For many applications it is sufficient to know that we can choose t € # so that 
A(t) is any prescribed real number between 0 and 1. Since this case is much simpler, 
we will deal with it now and defer treatment of the general case until the next section. 
We have 


A(t) = 1 — 69,0; 2)/6990; =) = 1- [td -a" "ata, 


n=1 


where g = e”'*. If r = iy, where y > 0, then 0 < q < 1. Moreover, as y increases 
from 0 to oo, gq decreases from | to 0 and the infinite product increases from 0 to 1. 
Thus A(z) decreases continuously from 1 to 0 and, for each w ¢€ (0, 1), there is a 
unique pure imaginary t € # such that A(t) = w. 

It should be mentioned that, also with our previous approach, S(w) could have 
been recognized as the square of a meromorphic function by defining sn wu, cn u, dn u 
to be the solution, for given A € C, of the system of differential equations (47) which 
satisfies the initial condition sn0 = 0, cn0 = dnO0 = 1. 

Elliptic functions were first defined by Abel (1827) as the inverses of elliptic 
integrals. His definitions were modified by Jacobi (1829) to accord with Legendre’s 
normal form for elliptic integrals, and the functions sn u, cn u, dn u are generally 
known as the Jacobian elliptic functions. The actual notation is due to Gudermann 
(1838). The definition by means of theta functions was given later by Jacobi (1838) in 
lectures. 

Several properties of the Jacobian elliptic functions are easy consequences of the 
later definition. In the first place, all three are meromorphic in the whole u-plane, since 
the theta functions are everywhere holomorphic. Their poles are determined by the 
zeros of 691(v) and are all simple. Similarly, the zeros of sn u, cn u and dn wu are 
determined by the zeros of 6;;(v), A19(v) and @9(v) respectively and are all simple. If 
we put 
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K = K(t) := 1 O69 (0; t)/2, K’ =K'(t) :=1K(c)/i, (51) 
then we have 


Poles of snu,cnu,dnu: u=2mK+4 (n+ 1)iK’ (m,n € Z). (52) 
Zeros of snu: u=2mK+2niK’, 
cnu: u=(2m+1)K+ 2niK’, (m,n € Z) (53) 
dnu: u=(2m+1)K+ (2n+ 1)iK’. 


From the definitions (46) of the Jacobian elliptic functions and the behaviour of 
the theta functions when o is increased by 1 or t we further obtain 


snu = —sn(u + 2K) = sn (u + 2iK’), 
cnu = —cn(u + 2K) = —cn (u + 2iK’), (54) 
dnu = dn (u-+ 2K) = —dn (u + 2iK’). 


It follows that all three functions are doubly-periodic. In fact sn u has periods 4K and 
2iK’, cn u has periods 4K and 2K + 2iK’, and dn u has periods 2K and 4iK’. In each 
case the ratio of the two periods is not real, since t € #. 

Since any period must equal a difference between two poles, it must have the form 
2mK + 2niK’ for some m,n € Z. Since 4K and 2iK’ are periods of sn u, but 2K 
is not, and since any integral linear combination of periods is again a period, it fol- 
lows that the periods of sn uw are precisely the integral linear combinations of 4K and 
2iK’. Similarly the periods of cn wu are the integral linear combinations of 4K and 
2K + 2iK’, and the periods of dn u are the integral linear combinations of 2K and 
4iK’. 

It was shown in §3 that, if0 < 1 < 1, then S(f, 4) has least positive period 2K (A), 
where 


1 
K() = | dx/exx)'”?. 


But, as we have seen, there is a unique pure imaginary t € # such that 2 = A(z), and 
2K [A(ct)] is then the least positive period of sn*(u; 7). Since the periods of sn*(w; 7) 
are 2mK + 2niK'(m,n € Z), and since K, K’ are real and positive when 7 is pure 
imaginary, it follows that 


K[A(t)] = K(z). 


The domain of validity of this relation may be extended by appealing to results which 
will be established in §6. In fact it holds, by analytic continuation, for all t in the region 
GF illustrated in Figure 3, since J(t) € # fort e J. 

From the definitions (46) of the Jacobian elliptic functions, the addition formulas 
for the theta functions (Proposition 6) and the expression (48) for 1, we obtain addition 
formulas for the Jacobian functions: 


sn (uy + u2) = (snuycnuzdnu2 + snuzcnu;dnu,)/U — Asn7ujsn-u2), 
cn (uv; +u2) = (cnuyjcnu2 — snuysnu2dnu;dnuz)/(U — Asn*ujsn7u2), (55) 


dn (u, + 42) = (dnu;dnuz2 — Asnuysnuzcenujcnu2)/(1 — Asn7u1sn7u7). 
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The addition formulas show that the evaluation of the Jacobian elliptic functions for 
arbitrary complex argument may be reduced to their evaluation for real and pure imag- 
inary arguments. 

The usual addition formulas for the sine and cosine functions may be regarded as 
limiting cases of (55). For if c = iy and y > oo, the product expansions (35) show 
that 


Ao(r) > 1, A1(v) > 1, 
Ao(v) ~ 2e7'*/4 cosxv, 014 (v) ~ 2ie™!*/* sin zn, 
and hence 


A>0, u-> xD, 
snu— sinu, cnu—-cosu, dnu—- 1. 
The definitions (46) of the Jacobian elliptic functions and the transformation 


formulas (37)—(38) for the theta functions imply also transformation formulas for the 
Jacobian functions: 


Proposition 9 Forallu € Candt € #, 
sn (u;t + 1) = (1 — A(r))!/7sn (w’; t)/dn(u’; 7), 
en(u; t+ 1) =cn(w; t)/dn(w’; 5), 
dn(u;t +1) =1/dn(w’; 7), 


where 
Ww =u/(1—A(r))' 
and 
(1 — A(z))'/* = 65, ; 1)/OG90; 7). 
Furthermore, 


Ar +1) = A(t) /[A(z) — U0, 
K(c + 1) = (1 — A(z))!/7K(z). 


Proof With v = u/0))(0; t + 1) we have, by (37), 
dn (u; t + 1) = A0(0; 7)O01(0; 7) /O01 (0; 7)Oo0(v; t) = 1/dn(w’; 7), 
where 
u! = 10590; t)v = OO; t)u/O5, (0; 7) = u/ — A(z)". 
Similarly, from (37) and (48)-(49), we obtain 
A(t +1) = —O%y(0; 7)/09, 0; t) = A(z)/LAz) — 1. 


The other relations are established in the same way. 
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Proposition 10 For allu € Candt € #, 


sn (u; —1/t) = —isn (iu; t)/cn (iu; 7), 
en(u; —1/t) = 1/cn(iu; Tt), 
dn (u; —1/t) = dn(iu; t)/cn(iu; Tt), 


Furthermore, 
A(-1/t) =1-A(r), 
K(—1/r) = K’(z). 
Proof Witho = u/100,(0; —1/t) we have, by (38), 


sn (u; —1/t) = —iOo0(0; —1/7)O11(0; —1/7)/A10(0; —1/t)Oo1(0; —1/T) 
= —69(0; 7)A11(70; T)/A01(0; 7)A10(7v; T). 


On the other hand, with vo’ = iu/z 05 (0; T) we have 
sn (iu; t)/cn (iu; tT) = —iOo0(0; 7)A11(v'; T)/A01(0; t)O10(0’; 7). 


Since tv = v’, by comparing these two relations we obtain the first assertion of the 
proposition. 

The next two assertions may be obtained in the same way. The final two assertions 
follow from (38), together with (48), (49) and (51). 


It follows from Proposition 10 that the evaluation of the Jacobian elliptic functions 
for pure imaginary argument and parameter zt may be reduced to their evaluation for 
real argument and parameter —1/r. 

From the definition (46) of the Jacobian elliptic functions and the duplication for- 
mulas for the theta functions we can also obtain formulas for the Jacobian functions 
when the parameter t is doubled (‘Landen’s transformation’ ): 


Proposition 11 For allu € Candt € #, 
sn (u”; 2t) = [1 + (1 — A(z))!/7]sn (u; ren (u; t)/dn (u; 7), 
en (u"; 27) = {1 —[1+ (1 — A(z))!/7]sn*(u; r)}/dn (u; 7), 
dn (u”; 27) = {1 — [1 — (1 — A(z))!/7]sn?(u; 7)}/dn (u; 7), 


where ul” = [1+ (1 — A(t))!/? Ju and (1 — A(r))'/? = 66, (0; 7) /06(0; 7). 


Furthermore, 


A(Q2t) = W7()/[1+ (A - Ace) 17, 
K(2r) = [1+ (1 — A(z))!/7]K(z)/2. 


Proof \f u = 20) (0; t)v and u = 10),(0; 2r)2o then, by Proposition 5, 


ul" = 262,(0; 27)u/62)(0; t) 
= [069(0; t) + 1 (0; 7) ]u/O4o(0; 7). 
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Hence, by (49), 
wl” = [1+ (1 — A(z)" Ju. 
By Proposition 5 also, 
sn (u”; 27) = —iOoo(0; 27)O10(v; 711 (0; T)/A10(0; 27) O00(v; 7 )O01 (v; 7). 
On the other hand, 
sn (u; t)cen (u; t)/dn (wu; 7) = —105(0; t)O10(0; 7) O11 (0; t)/D, 


where D = 6?.(0; T)Oo0(v; T)O01 (0; T). 
Since 2499 (0; 27 )O19(0; 27) = 0?.(0; tT), it follows that 


sn (u”; 27) = 2049 (0; 27) sn (u; t) cn (U; T)/O}q(0; T) dn (u; 7). 


Since 263,(0; 2t) /05 (0; t) = u"/u, this proves the first assertion of the proposition. 
The remaining assertions may be proved similarly. 


We show finally how the standard elliptic integrals of the second and third kinds, 
defined by (24) and (25), may be expressed in terms of theta functions. If we put 


O(u) = O1(v), (56) 
where u = 105. (0)v, then since 
2S(u) = Asn?u = —O79(0)O}, (0) /O59 (095; (0), 
we can rewrite (45) in the form 
d{O'(u)/O(u)}/du = —a +1—AS(u), 


where a is independent of wu and the prime on the left denotes differentiation with 
respect to u. Since 9’(0) = 0, by integrating we obtain 


E(u) = O'(u)/O(u) + au. 


To determine a we take u = K. Since 0, (1/2) = @% (1) = (0) = 0, we obtain 
a = E/K, where 


K 1 
E = E(K) ay {1 —AS(u)} du = (1 — Ax) dx/gj(x)!/? 
0 0 
is a complete elliptic integral of the second kind. Thus 
E(u) = 0'(u)/@(u) + uE/K. (57) 
Substituting this expression for E(u) in (27), we further obtain 


II(u, a) = uO’(a)/@(a) + (1/2) log{@(u — a)/O(u + a)}. (58) 
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6 The Modular Function 


The function 
A(t) := Of9(0; t)/O}(0; 7), 


which was introduced in §5, is known as the modular function. In this section we study 
its remarkable properties. (The term “modular function’, without the definite article, is 
also used in a more general sense, which we do not consider here.) 

The modular function is holomorphic in the upper half-plane .#”. Furthermore, we 
have 


Proposition 12 For anyt € #, 
Ae +1) = A(e)/[A) - 1, 
A(-1/t) =1-—A(r), 
M-1/(e +1) = 1/11 — A), 
M(t — 1)/t) = (A(r) — 1/4), 
Mar /(t +1)) = 1/A(r). 


Proof The first two relations have already been established in Propositions 9 and 10. 
If, as in §1, we put 


Matai. ViSit=%, 


and if we also put Tr = t+ 1, St = —1/rt, then they may be written in the 
form 


A(Tt)=UVA(t), ACSt) = UA(t). 
It follows that 
A(-1/(¢ +1) = A(STrt) = UNTt) = U°VA(t) = VA(t) = 1/1 -—A(a)]. 


Similarly, 
A(t — 1)/t) = A(TSt) = V?A(z) = [A@) — 11/22), 
A(t /(t + 1)) = (TST rt) = UV*A(t) = 1/A(z). 
As we saw in Proposition IV.12, together the transformations St = —1/zt and 


Tt =t +1 generate the modular group I, consisting of all linear fractional transfor- 
mations 


t’ =(at+b)/(ct +d), 


where a, b, c,d € Zand ad — bc = 1. Consequently we can deduce the effect on 2(z) 
of any modular transformation on t. However, Proposition 12 contains the only cases 
which we require. 

We will now study in some detail the behaviour of the modular function in the 
upper half-plane. We first observe that we need only consider the behaviour of 1(7) 
in the right half of #. For, from the definitions of the theta functions as infinite 
series, 


Ao0(0; 7) = Ap0(0; —7), 91 (0; tT) = M1 (0; —7), 
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where the bar denotes complex conjugation, and hence 
A(-T) = A(t). (59) 


We next note that, by taking c = 7 in the relation A(—1/t) = 1 — A(t), we 
obtain A(j) = 1/2. We have already seen in §5 that A(z) is real on the imaginary 
axis T = iy (y > O), and decreases from 1 to 0 as y increases from 0 to oo. 
Since A(t + 1) = A(t) /[A(z) — 1], it follows that A(z) is real also on the half-line 
t = 1+/iy (y > O), and increases from —oo to 0 as y increases from 0 to oo. More- 
over, A(1 +7) = —-l. 

The linear fractional map t = (z’ — 1)/t’ maps the half-line Zr’ = 1, .%r' > 0 
onto the semi-circle |r — 1/2| = 1/2,.%t > 0, and tr’ = 1 +i is mapped to 
t = (1+/)/2. Since 


A((c! = 1)/t') = [A(z’) — /Az’), 


it follows from what we have just proved that, as t traverses this semi-circle from 
0 to 1, A(z) is real and increases from | to oo. Moreover, A((1 + 7)/2) = 2. 
If Zr = 1/2, then t = 1 — 7 and hence, by (59), 


A(t) = A(t — YI = A(t)/TA(z) — 


which implies 
|A(t) — 1? =1. 


Thus w = A(c) maps the half-line #r = 1/2, %r > O into the circle |w — 1| = 1. 
Furthermore, the map is injective. For if A(t,) = 4(72), then 1(27;) = A(272), by 
Proposition 11, and the map is injective on the half-line @r = 1, %t > O. If 
t = 1/2+iy, where y — +00, then 


O0(0; t) > 1, 10(0; 7) ~ 2e7!*/4 
and hence 
A(t) ~ l6ie*”. 


In particular, A(t) € # and A(t) > O. Since A((1 + i)/2) = 2, it follows that 
w = A(t) maps the half-line t = 1/2 +iy (y > 1/2) bijectively onto the semi-circle 
jw — 1] =1, %w > 0. 

If |r] = 1, 4c > Oand ct’ = t/(1 +7), then Zr’ = 1/2, %r’ > O and 
A(t’) = 1/A(z). Consequently, by what we have just proved, w = A(t) maps 


the semi-circle |r] = 1, %t > O bijectively onto the half-line Zw = 1/2, 
Fw > 0. 

The point e7'/3 = (1 + iV3)/2 is in 7 and lies on both the line Zr = 1/2 and 
the circle |z| = 1. Hence A(e*‘/3) lies on both the semi-circle |w — 1| = 1, .%w > 0 


and the line Zw = 1/2, which implies that 


Ie) _ ett/3. 
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C' 


t-plane w-plane 


Fig. 2. w = A(t) maps J onto 7’. 


Again, since A(t — 1) = A(t)/[A(z) — 1], w = A(t) maps the semi-circle 
jc —1| =1, %rt > O bijectively onto the semi-circle |w| = 1, %w > 0. 

In particular, we have the behaviour illustrated in Figure 2: w = A(t) maps the 
boundary of the (non-Euclidean) ‘triangle’ Y with vertices A = 0, B = (1 + i)/2, 
C = e*'/? bijectively onto the boundary of the ‘triangle’ .7’ with vertices A’ = 1, 
B’ = 2, C’ = e'/3, We are going to deduce from this that the region inside 7 is 
mapped bijectively onto the region inside 7’. The reasoning here does not depend on 
special properties of the function or the domain, but is quite general (the ‘principle 
of the argument’). To emphasize this, we will temporarily denote the independent 
variable by z, instead of rt. 

Choose any wo € C which is either inside or outside the ‘triangle’ 7’, and let 
A denote the change in the argument of w — wo as w traverses 7’ in the direction 
A’B’C’. Thus A = 27 or 0 according as wo is inside or outside 7’. But A is also the 
change in the argument of A(z) — wo as z traverses 7 in the direction ABC. Since 
A(z) is a nonconstant holomorphic function, the number of times that it assumes the 
value wo inside 7 is either zero or a positive integer p. 

Suppose the latter, and let z = (1,...,¢p be the points inside 7 for which 
A(z) = wo. In the neighbourhood of ¢; we have, for some positive integer mj; and 
some ag; # 0, 


A(z) — wo = aoje — Ei" + a1g@— EI +0 
and 
A(z) = mjagj(z — Cj"! + (mj t+ Yarjge— Cy" +---. 
Hence 
A(z) /[A@) — wo] = mj/@— Cj) + F7@; 
where f;(z) is holomorphic at ¢;. Consequently 


Pp 
f (2) = 4'@)/[A@) = wol — dD mj/@ - G) 


j=l 
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is holomorphic at every point z inside .7. Hence, by Cauchy’s theorem, 


L f(z)dz=0. 


But, since log 1(z) = log |A(z)| + i arg A(z), 
7 A(z) dz/[A(z) — wo] =i A. 
ow 


Similarly, since ¢; is inside 7, 


| dz/(— Cj) = 2ni. 
T 


It follows that 
Pp 
A=2n > mj. 
j=l 


If wo is outside .7’, then A = O and we have a contradiction. Hence 1(z) is never 
outside 7’ if z is inside 7. If wo is inside .7’, then A = 27. Hence A(z) assumes 
each value inside 7’ at exactly one point z inside 7, and at this point 1’(z) 4 0. 

Finally, if 1(z) assumed a value wo on 7’ at a point zo inside 7, then it would 
assume all values near wo in the neighbourhood of zo. In particular, it would assume 
values outside .7’, which we have shown to be impossible. It follows that w = A(z) 
maps the region inside 7 bijectively onto the region inside 7’, and 4'(z) 4 0 for all 
z inside 7. 

We must also have 4'(z) 4 0 for all z 4 0 on J. Otherwise, if (zo) = wo and 
A (zo) = 0 for some zo € FN then, for some m > | andc 40, 


m 


A(z) — wo ~ c(z — Zo)” aS Z > ZO. 


But this implies that A(z) takes values outside 7’ for some z near zo inside 7. 
By putting together the preceding results we see that w = /(z) maps the domain 


Da{t€eH:0< Rt <1,\t—1/2| > 1/2} 


bijectively onto the upper half-plane .#, with the subdomain k of Z mapped onto the 
subdomain k’ of #(k = 1, ..., 6), as illustrated in Figure 3. Moreover, the boundary 
in # of F is mapped bijectively onto the real axis, with the points 0 and | omitted. 

If we denote by J the closure of J in # and by Y* the reflection of J in the 
imaginary axis, then it follows from (59) that w = A(z) maps the region 


QI ={t 6 H::0< rt <1, |r —1/2| = 1/2} 
U{re #: -1<&r <0,|r+1/2| > 1/2} 


bijectively onto the whole complex plane C, with the points 0 and 1 omitted. This 
answers the question raised in 85. 


6 The Modular Function 535 


There remains the practical problem, for a given w € C, of determining t € # 
such that A(z) = w. If 0 < w < 1, we can calculate t by the AGM algorithm, using 
the formula (4), since t = i K(1 — w)/K (w). For complex w we can use an extension 
of the AGM algorithm, or proceed in the following way. 

Since 


(1 — A(z))!/4 = 6010; 7) /O00(0; 7) 
and 
(oe) 5 CO 5 
G00; t) =14+2>° 9", O10;7)=14+2> (-)"q", 
n=1 n=1 


we have 


[1-1 —A(z)) 41/1 + = A(z)" 
= [000(0; 7) — 601 (0; 7) 1/LA00(0; 7) + 401 (0; 7)] 
=2g+qQ tq? t--)/A+2q4 +2q' +---). 


Thus if we put 
f:=[1-(—w)“4V/1 +0 —w)4), 
we have to solve for g the equation 
/2=G+q+q? +--+ 2q% + 2g? +--+). 
Expanding the right side as a power series in g and inverting the relationship, we obtain 
gq = €/2 + 2(€/2)° + 15(€/2)? + 150(€/2)? + O(€/2)"". 


To ensure rapid convergence we may suppose that, in Figure 3, w is situated in 
the region 5’ or on its boundary, since the general case may be reduced to this by a 
linear fractional transformation. It is not difficult to show that in this region |¢| takes 


its maximum value when w = e7!/ 3 and then 
2D H 
+ | 2 = 7 
0 172 1 
t-plane w-plane 


Fig. 3. w = A(t) maps J onto #. 
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€=(1—e */)/(1 4 7/7) = j tana /24. 


Thus || < tanz/24 < 2/15 and |€/2|* < 2 x 10->. Since .%r > J3/2 for z in the 
region 5, for the solution g we have 


Iq| < e7*¥3/2 < 1/15. 


Having determined g, we may calculate K(z), sn u,... from their representations by 
theta functions. 


7 Further Remarks 


Numerous references to the older literature on elliptic integrals and elliptic functions 
are given by Fricke [12]. The more important original contributions are readily avail- 
able in Euler [10], Lagrange [21], Legendre [22], Gauss [13], Abel [1] and Jacobi [16], 
which includes his lecture course of 1838. 

It was shown by Landen (1775) that the length of arc of a hyperbola could be ex- 
pressed as the difference of the lengths of two elliptic arcs. The change of variables 
involved is equivalent to that used by Lagrange (1784/5) in his application of the AGM 
algorithm. However, Lagrange used the transformation in much greater generality, 
and it was his idea that elliptic integrals could be calculated numerically by iterat- 
ing the transformation. The connection with the result of Landen was made explicit by 
Legendre (1786). 

By bringing together his own results and those of others the treatise of 
Legendre [22], and his earlier Exercices de calcul integral (1811/19), contributed sub- 
stantially to the discoveries of Abel and Jacobi. The supplementary third volume of his 
treatise, published in 1828 when he was 76, contains the first account of their work in 
book form. 

The most important contribution of Abel (1827) was not the replacement of ellip- 
tic integrals by elliptic functions, but the study of the latter in the complex domain. 
In this way he established their double periodicity, determined their zeros and poles 
and (besides much else) showed that they could be represented as quotients of infinite 
products. 

The triple product formula of Jacobi (1829) identified these infinite products with 
infinite series, whose rapid convergence made them well suited for numerical compu- 
tation. Infinite series of this type had in fact already appeared in the Théorie analy- 
tique de la Chaleur of Fourier (1822), and Proposition 3 had essentially been proved 
by Poisson (1827). Remarkable generalizations of the Jacobi triple product formula 
to affine Lie algebras have recently been obtained by Macdonald [23] and Kac and 
Peterson [17]. For an introductory account, see Neher [24]. 

It is difficult to understand the glee with which some authors attribute to Gauss 
results on elliptic functions, since the world owes its knowledge of these results not to 
him, but to others. Gauss’s work was undoubtedly independent and in most cases ear- 
lier, although not in the case of the arithmetic-geometric mean. The remark, in §335 of 
his Disquisitiones Arithmeticae (1801), that his results on the division of the circle into 
n equal parts applied also to the lemniscate, was one of the motivations for Abel, who 
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carried out this extension. (For a modern account, see Rosen [25].) However, Gauss’s 
claim in a letter to Schumacher of 30 May 1828, quoted in Krazer [20], that Abel had 
anticipated about a third of his own research is quite unjustified, and not only because 
of his inability to bring his work to a form in which it could be presented to the world. 

It was proved by Liouville (1834) that elliptic integrals of the first and second 
kinds are always ‘nonelementary’. For an introductory account of Liouville’s theory, 
see Kasper [18]. (But elliptic integrals of the third kind may be ‘elementary’; see 
Chapter IV, §7.) 

The three kinds of elliptic integral may also be characterized function-theoretically. 
On the Riemann surface of the algebraic function w? = g(z), where g is a cubic 
without repeated roots, the differential dz/w is everywhere holomorphic, the differen- 
tial zdz/w is holomorphic except for a double pole at oo with zero residue, and the 
differential [w(z) + w(a)]dz/2(z —a)w(z) is holomorphic except for two simple poles 
at a and oo with residues | and —1 respectively. 

Many integrals which are not visibly elliptic may be reduced to elliptic integrals by 
a change of variables. A compilation is given by Byrd and Friedman [8], pp. 254—271. 

The arithmetic-geometric mean may also be defined for pairs of complex numbers; 
a thorough discussion is given by Cox [9]. For the application of the AGM algorithm 
to integrals which are not strictly elliptic, see Bartky [4]. 

The differential equation (6) is a special case of the hypergeometric differential 
equation. In fact, if |2| < 1, then by expanding (1 — Ax)~!/?, resp. (1 — Ax)!/, by the 
binomial theorem and integrating term by term, the complete elliptic integrals 


1 1 
K(a= | [4x(1 — x)(1 — Ax)]7!/*dx, EQ) = | [1 — Ax)/4x(1 — x)]!/7dx, 
0 0 


may be identified with the hypergeometric functions 
(x/2)F(1/2, 1/2; 1;4),  (@/2)F(—1/2, 1/2; 1; 4), 
where 
F(a, Bi y32) = 1+ aPz/l-y +aa+DPB+V2*/1-2-y94+)+--. 


Many transformation formulas for the complete elliptic integrals may be regarded as 
special cases of more general transformation formulas for the hypergeometric function. 

The proof in §3 that K (1 — 4)/K (A) has positive real part is due to Falk [11]. 

It follows from (12)—(13) by induction that S(mu) and S’(nu)/S’(u) are rational 
functions of S(u) for every integer n. The elliptic function S(u) is said to admit com- 
plex multiplication if S(u) is a rational function of S(u) for some complex number 
ut which is not an integer. It may be shown that S(w) admits complex multiplication if 
and only if 2 4 0, | and the period ratio iK (1 — 2)/K (A) is a quadratic irrational, in 
the sense of Chapter IV. This condition is obviously satisfied if 4 = 1/2, the case of 
the lemniscate. 

A function f (uw) is said to possess an algebraic addition theorem if there is a poly- 
nomial p(x, y, z), not identically zero and with coefficients independent of u,v, such 
that 


p(f(u+o), flu), f(v)) =0 forall u,v. 
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It may be shown that a function f, which is meromorphic in the whole complex plane, 
has an algebraic addition theorem if and only if it is either a rational function or, 
when the independent variable is scaled by a constant factor, a rational function of 
S(u, A) and its derivative S’(u, 1) for some A € C. This result (in different notation) 
is due to Weierstrass and is proved in Akhiezer [3], for example. A generalization of 
Weierstrass’ theorem, due to Myrberg, is proved in Belavin and Drinfeld [6]. 

The term ‘elliptic function’ is often used to denote any function which is meromor- 
phic in the whole complex plane and has two periods whose ratio is not real. It may 
be shown that, if the independent variable is scaled by a constant factor, an elliptic 
function in this general sense is a rational function of S(u, 2) and S’(u, 2) for some 
A#0, 1. 

The functions f(v) which are holomorphic in the whole complex plane C and 
satisfy the functional equations 


fO+D=f), fetr) =e P+) Fo), 


where n € N andzt € #, form an n-dimensional complex vector space. It was shown 
by Hermite (1862) that this may be used to derive many relations between theta func- 
tions, such as Proposition 6. 

Proposition 11 can be extended to give transformation formulas for the Jacobian 
functions when the parameter t is multiplied by any positive integer n. See, for exam- 
ple, Tannery and Molk [27], vol. IL. 

The modular function was used by Picard (1879) to prove that a function f(z), 
which is holomorphic for all z € C and not a constant, assumes every complex value 
except perhaps one. The exponential function exp z, which does not assume the value 
O, illustrates that an exceptional value may exist. A careful proof of Picard’s theorem 
is given in Ahlfors [2]. (There are also proofs which do not use the modular function.) 

It was already observed by Lagrange (1813) that there is a correspondence between 
addition formulas for elliptic functions and the formulas of spherical trigonometry. 
This correspondence has been most intensively investigated by Study [26]. 

There is an n-dimensional generalization of theta functions, which has a useful ap- 
plication to the lattices studied in Chapter VIII. The theta function of an integral lattice 
A in R” is defined by 


O4(t) = ya = (4 >, Nnq”, 

ueA m>1 
where g = e7!? 
and A = Z, then 


and N,, is the number of vectors in A with square-norm m. If n = 1 


6z(t) =1+4+2q +2q4 +29? +--- = 0(0; 7). 


It is easily seen that 64 (7) is a holomorphic function of z in the half-plane .%7z > 0. 
It follows from Poisson’s summation formula that the theta function of the dual lattice 
A* is given by 


O4*(t) = d(A)(i/t)"”764(—1/t) for %r > O. 
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Many geometrical properties of a lattice are reflected in its theta function. However, a 
lattice is not uniquely determined by its theta function, since there are lattices in R* 
(and in higher dimensions) which are not isometric but have the same theta function. 


For applications of elliptic functions and theta functions to classical mechanics, 


conformal mapping, geometry, theoretical chemistry, statistical mechanics and approxi- 
mation theory, see Halphen [15] (vol. 2), Kober [19], Bos et al. [7], Glasser and 
Zucker [14], Baxter [5] and Todd [28]. Applications to number theory will be 
considered in the next chapter. 
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XI 


Connections with Number Theory 


1 Sums of Squares 


In Proposition I1.40 we proved Lagrange’s theorem that every positive integer can 
be represented as a sum of 4 squares. Jacobi (1829), at the end of his Fundamenta 
Nova, gave a completely different proof of this theorem with the aid of theta functions. 
Moreover, his proof provided a formula for the number of different representations. 
Hurwitz (1896), by developing further the arithmetic of quaternions which was used 
in Chapter II, also derived this formula. Here we give Jacobi’s argument preference 
since, although it is less elementary, it is more powerful. 


Proposition 1 The number of representations of a positive integer m as a sum of 
4 squares of integers is equal to 8 times the sum of those positive divisors of m which 
are not divisible by 4. 


Proof From the series expansion 


Ao0(0) = ba 


neZ 
we obtain 
2442 
HO= DL gt 14D rata”, 
N4,....NgEZ m>1 
where r4(m) is the number of solutions in integers n1,..., 4 of the equation 


2 2 
net: + +ng=m. 


We will prove the result by comparing this with another expression for che (0). 
We can write equation (43) of Chapter XII in the form 


01 9(0)/A10(v) — A (0) /Oo1 (0) = 71459 (0)400(0)O11 (0) /401 (&)A10(0). 
Differentiating with respect to v and then putting v = 0, we obtain 


1(9(0)/A10(0) — 85; 0) /Ao1 (0) = 7145 (0){ (0) /401(0)O10(0) = — 27650), 
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by (36) of Chapter XII. Since the theta functions are all solutions of the partial differ- 
ential equation 


a*y/dv* = —4n*qdy/dq, 
the last relation can be written in the form 
4q0/6q log{10(0)/Ao1(0)} = (0). 


On the other hand, the product expansions of the theta functions show that 


410(0)/001 (0) = 2q'/* | Ja +a"? /TTa ~ gly? 


n>1 n>1 

= age []a -«"y /TIa ery eT ed a 
n>1 n>1 

_ ag" [[a = gy _ ee 
n>1 


Differentiating logarithmically, we obtain 


0§)(0) = 49/6q log{10(0) /4o1(0)} 
= 148) ng"/(l— 4") —8 >) 4ng*"/(1 — 9") 


n>1 n>1 
= 1 + 8 > ding = 4ng*") 
n>1k>1 
=1+8 > {o(m) —o'(m)}q”, 
m>1 


where o (m) is the sum of all positive divisors of m and a’(m) is the sum of all pos- 
itive divisors of m which are divisible by 4. Since the coefficients in a power series 
expansion are uniquely determined, it follows that 


r4(m) = 8{a(m) — o'(m)}. 


Proposition | may also be restated in the form: the number of representations of m 
as a sum of 4 squares is equal to 8 times the sum of the odd positive divisors of m if m 
is odd, and 24 times this sum if m is even. For example, 


r4(10) = 24(1 +5) = 144. 


Since any positive integer has the odd positive divisor 1, Proposition | provides a new 
proof of Proposition II.40. 

The number of representations of a positive integer as a sum of 2 squares may be 
treated in the same way, as Jacobi also showed (or, alternatively, by developing further 
the arithmetic of Gaussian integers): 


1 Sums of Squares 543 


Proposition 2 The number of representations of a positive integer m as a sum of 
2 squares of integers is equal to 4 times the excess of the number of positive divisors 
of m of the form 4h + 1 over the number of positive divisors of the form 4h + 3. 


Proof We have 


e) 2 2 
oO) = SD) gt? =14+ D> rGn)q”, 
ny,nyEZ m>1 
where r2(m) is the number of solutions in integers n;, n2 of the equation 
ni + ny =m. 


To obtain another expression for Go (0) we use again the relation 


O19(v)/A10(v) — 9% (2) /Ao1 (0) = 7iAG}p(0)O00(v)O11 (0) /4o1 (v2) M10(0), 
but this time we simply take v = 1/4. Since 
6011/4) = S°(-iy"g"™ = SLi" g"™ = Oo0(1/4), 
neZ neZ 
and similarly 6;;(1/4) = i @,9(1/4), we obtain 
1 O5(0) = 1 (1/4)/O01 (1/4) — Ai9(1/4)/A10(1/4). 


By differentiating logarithmically the product expansion for @j9(v) and then putting 
v = 1/4, we get 


(o(1/4)/010(0/4) = —z — 42 Dg?" +9"). 
n>1 
Similarly, by differentiating logarithmically the product expansion for 4; (v) and then 
putting v = 1/4, we get 


45, (1/4) /001(1/4) =47 Sa 7 4 Go). 


n>1 
Thus 
4 (1/4) /O01(1/4) — 0191/4) /A10(1/4) = 2 +40 Do g"/( +97") 
n>1 
and hence 
O50) = 144 > q"/A+q""). 
n>1 

Since 


qh/a+q") =q"d—q™")/A—-4") =" - 9") >i q™, 
k>0 


544 XII Connections with Number Theory 


it follows that 
650 (0) =1+ eo aa = gtkt3yny 
n>1k>0 


= ipa Sila (m) — d3(m)}q”™, 


m>1 


where d\(m) and d3(m) are respectively the number of positive divisors of m 
congruent to | and 3 mod 4. Hence 


ra(m) = 4{d1(m) — d3(m)}. 


From Proposition 2 we immediately obtain again that any prime p = | mod4 may 
be represented as a sum of 2 squares and that the representation is essentially unique. 
Proposition II.39 may also be rederived. 

The number r;(m) of representations of a positive integer m as a sum of s squares 
has been expressed by explicit formulas for many other values of s besides 2 and 4. 
Systematic ways of attacking the problem are provided by the theory of modular forms 
and the circle method of Hardy, Ramanujan and Littlewood. 


2 Partitions 


A partition of a positive integer n is a set of positive integers with sum n. For example, 
{2, 1, 1} is a partition of 4. We denote the number of distinct partitions of n by p(n). 
For example, p(4) = 5, since all partitions of 4 are given by 


{4}, {3, 1}, {2, 2}, {2, 1, 1}, (1, 1, 1, 1}. 


It was shown by Euler (1748) that the sequence p(n) has a simple generating 
function: 
Proposition 3 [f |x| < 1, then 
1/0 =x) — x) = 33). = 1+ pda". 
n>1 


Proof If |x| < 1, then the infinite product [],,.,(1 — x’") converges and its recip- 
rocal has a convergent power series expansion. To determine the coefficients of this 
expansion note that, since 


ag) = ya, 
k>0 


the coefficient of x”(n > 1) in the product [],,.,(. — x™)—! is the number of repre- 
sentations of n in the form 


n= kj +2k2+---, 


where the k; are non-negative integers. But this number is precisely p(7), since any 
partition is determined by the number of 1’s, 2’s, ... that it contains. 
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For many purposes the discussion of convergence is superfluous and Proposition 3 
may be regarded simply as a relation between formal products and formal power series. 

Euler also obtained an interesting counterpart to Proposition 3, which we will 
derive from Jacobi’s triple product formula. 


Proposition 4 /[f|x| < 1, then 


d-—-O0-270 = 29 4+= Ss ener 


meZ, 


Proof If we take q = x*/ and z = —x!/? in Proposition XII.2, we obtain at once the 
result, since 


[[a =" a came 6 ee | = [[a _ x*), 


n>1 k>1 


Proposition 4 also has a combinatorial interpretation. The coefficient of x”(n > 1) 
in the power series expansion of [],.,(1 — x*) is 


sn = D(-D”, 


where the sum is over all partitions of n into unequal parts and v is the number of parts 
in the partition. In other words, 
Sn = pe(n) _ pon), 


where p(n), resp. p> (n), is the number of partitions of the positive integer n into an 
even, resp. odd, number of unequal parts. On the other hand, 


SCpmsnGnt D2 =1+ ep een Ae xmGm—1)/2)_ 


meZ m>1 


Thus Proposition 4 says that p(n) = p(n) unless n = m(3m+1)/2 forsomem €N, 
in which case p3(n) — p3(n) = (—1)”. 
From Propositions 3 and 4 we obtain 


1 Be Denman gmeme2y +> paox'| =. 


m>1 k>1 


Multiplying out on the left side and equating to zero the coefficient of x”(n > 1), we 
obtain the recurrence relation: 
p(n) = p(n — 1) + pn — 2) — ptr —5) — pa —7) 
4+-+-+4+ (-1)""! p(n —mGBm — 1)/2) 
+ (=1)""' p(n = mBm + 1)/2) +++, 
where p(0) = 1 and p(k) = 0 fork < 0. This recurrence relation is quite an effi- 


cient way of calculating p(m). It was used by MacMahon (1918) to calculate p(n) for 
n < 200. 
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In the same way that we proved Proposition 3 we may show that, if |x| < 1, then 


1/( =x) =x?) (=x) = 14+ pnln)x", 


n>1 


where p,,(1) is the number of partitions of n into parts not exceeding m. 


From the vast number of formulas involving partitions and their generating func- 
tions we select only one more pair, the celebrated Rogers—Ramanujan identities. The 
proof of these identities will be based on the following preliminary result: 


Proposition 5 /f|g| < 1 and |x| < |q|7!, then 


1+ bee /(q)n = = VC 1)"x 2n gle sa = eg OD aise sa 


n>1 n>0 
where (a)o = 1, 


(a)n = 1 —a)(1 —aq)--- C1 =aq"™"') ifn>1, and 
(a)oo = 1 — a) — aq) - aq’) als 


Proof Consider the q-difference equation 


f(x) = fxg) +xq f(xq’). 


A formal power series Daa dyx" satisfies this equation if and only if 


an(1 —q”) =an_-1q"""! (n> 1). 
Thus the only formal power series solution with ap = | is 


f(x) =14 xq/ —q)+x7q*/ -— 9g) - 4’) 
+9? /1-gd-¢q)d—-@q)+- 


Moreover, if |g| < 1, this power series converges for all x € C. 
If |g| < 1, the functions 


F(x) = xe [x2 gmt )/2-2n¢4 2 g22n+Dy 1g), (qt )oo, 


n>0 


G(x) = DI yr gre DP — xgQ?™/(qynlaq"* oo 


n>0 


are holomorphic for |x| < |g|7!. 
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We have 
F(x) — G(x) 
= = VC 1)"x 2n gure = xg 1) _ q n +xq" + 1 H(g)n(xq"t Yoo 
n>0 
— Deyn grr DP tg" (1 _ q") ae xg _ xg" th /(@alxg"t oo 
n>0 
= = >C 1)"x 2n gure 1a), Gg" Nee 
n>1 
+xq “eC 1)"x 2n gree 1a)\n (xg xs 
n>0 
= % sa xe 1)"x 2n gree 1a) gg? Neg 
n>0 
+xq xe 1)"x 2n gree Gy ge 
n>0 
= xq > (-1)" aq) gre )P "1 — (gg @ nla" )o0 
n>0 
= xqG(xq). 
Similarly, 


Gx) = yrange tg" — xg W/Qynlea"™ oe 

n>0 

= =>C 1)"x 2n guett 24g" (1 ay" 1 — xq" "}/(q@)nl(xg"* oo 
n>0 

=> C 1)"x 2n gon et D/2— 1g), 1 (xg" to 
n>1 
+> 1)"x 2n go PG). Ge" Nx 

n>0 

= = VC 1)" (eg )gonetD/2-20¢4 — Geg)2q22"t D3 )(g)n(xq" toc 
n>0 

= F (xq). 


Combining this with the previous relation, we obtain 
F(x) = F(xq) + xq F(xq’). 


But we have seen that this q-difference equation has a unique holomorphic solution 
f (x) such that f(0) = 1. Hence F(x) = f(x). 
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The Rogers—Ramanujan identities may now be easily derived: 


Proposition 6 [f|q| < 1, then 


vad _ wil _ 7) a a = q”) = I] ‘al _ gr ya _ gery 


n=0 m>0 
Sad —g)i- q’) eo(il q") _ I] (i< gy _ gry. 
n>=0 m>0 


Proof Put P = Deer — q*). By Proposition 5 and its proof we have 


> 4" /-g0-9)- 9") = FQ) 


n>0 
= ! + CMO? 4 greeny] fp 
n>1 
and, since F(q) = G(1), 
Sgt) (1 — gq) —q?)---— 4") = FQ) 


n>0 


- ! Senger, ie 


n>1 


On the other hand, by replacing g by q°/? and z by —q!/*, resp. —q?/?, in Jacobi’s 
triple product formula (Proposition XII.2), we obtain 


VC L)tgrOrtD/2 — [[a- Fey = er yl _ gr} 


neZ m>1 


= =P/ [[a- ge gry 


m>0 
and 


VC 1)"q n(5n+3)/2 __ =|[f[a- y= ed ay) 


neZ m>1 


= P/ [[a Sr ie’ il ay, 


m>0 


Combining these relations with the previous ones, we obtain the result. 


The combinatorial interpretation of the Rogers—Ramanujan identities was pointed 
out by MacMahon (1916). The first identity says that the number of partitions of a 
positive integer n into parts congruent to +1 mod 5 is equal to the number of partitions 
of n into parts that differ by at least 2. The second identity says that the number of par- 
titions of a positive integer n into parts congruent to +2 mod 5 is equal to the number 
of partitions of n into parts greater than | that differ by at least 2. 
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A remarkable application of the Rogers-Ramanujan identities to the hard hexagon 
model of statistical mechanics was found by Baxter (1981). Many other models in sta- 
tistical mechanics have been exactly solved with the aid of theta functions. A unifying 
principle is provided by the vast theory of infinite-dimensional Lie algebras which has 
been developed over the past 25 years. 

The number p(n) of partitions of n increases rapidly with n. It was first shown by 
Hardy and Ramanujan (1918) that 


p(n) ~ e*V7"/3 /4nJ/3 asn > 0. 


They further obtained an asymptotic series for p(n), which was modified by 
Rademacher (1937) into a convergent series, from which it is even possible to cal- 
culate p(n) exactly. A key role in the difficult proof is played by the behaviour under 
transformations of the modular group of Dedekind’s eta function 


nc) = 4"? Ta 4, 


k>1 


where g = e™'* and t € # (the upper half-plane). 

The paper of Hardy and Ramanujan contained the first use of the ‘circle method’, 
which was subsequently applied by Hardy and Littlewood to a variety of problems in 
analytic number theory. 
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We define an affine plane curve over a field K to be a polynomial f(X, Y) in two 

indeterminates with coefficients from K, but we regard two polynomials f(X, Y) and 

f*(X, Y) as defining the same affine curve if f* = Af for some nonzero 2 € K. The 

degree of the curve is defined without ambiguity to be the degree of the polynomial /. 
If 


S(X,Y) =aX+b¥ +c 
is a polynomial of degree 1, the curve is said to be an affine line. If 
f (X,Y) = aX? +bXY +cY? +1X4+mY +n 


is a polynomial of degree 2, the curve is said to be an affine conic. If f(X, Y) is a 
polynomial of degree 3, the curve is said to be an affine cubic. It is the cubic case in 
which we will be most interested. 

Let @ be an affine plane curve over the field K,, defined by the polynomial f(X, Y). 
We say that (x, y) € K? is a point or, more precisely, a K -point of the affine curve 
if f(x, y) = 0. The K-point (x, y) is said to be non-singular if there exist a,b € K, 
not both zero, such that 


fatX,y+Y)=aX4+bY+---, 


550 XII Connections with Number Theory 


where all unwritten terms have degree > 1. Since a, b are uniquely determined by /, 
we can define the tangent to the affine curve @ at the non-singular point (x, y) to be 
the affine line 


€(X,Y) =aX + bY — (ax +by). 


It is easily seen that these definitions do not depend on the choice of polynomial within 
an equivalence class {Af: OA /1 € K}. 

The study of the asymptotes of an affine plane curve leads one to consider also 
its ‘points at infinity’, the asymptotes being the tangents at these points. We will now 
make this precise. 

If the polynomial f(X, Y) has degree d, then 


F(X, Y, Z) = Z4 f (X/Z,Y/Z) 
is a homogeneous polynomial of degree d such that 
F(X, Y) = F(X, Y, 1). 


Furthermore, if #(X, Y, Z) is any homogeneous polynomial such that f(X,Y) = 
F(X, Y,1), then F(X, Y, Z) = Z" F(X, Y, Z) for some non-negative integer m. 

We define a projective plane curve over a field K to be a homogeneous polyno- 
mial F(X, Y, Z) of degree d > 0 in three indeterminates with coefficients from K, but 
we regard two homogeneous polynomials F(X, Y, Z) and F*(X, Y, Z) as defining the 
same projective curve if F* = JF for some nonzero 4 € K. The projective curve is 
said to be a projective line, conic or cubic if F has degree 1,2 or 3 respectively. 

If @ is an affine plane curve, defined by a polynomial f (X, Y) of degree d > 0, the 
projective plane curve @, defined by the homogeneous polynomial Z¢ f (X/Z, Y/Z) 
of the same degree, is called the projective completion of @. Thus the projective 
completion of an affine line, conic or cubic is respectively a projective line, conic 
or cubic. 

Let @ bea projective plane curve over the field K, defined by the homogeneous 
polynomial F(X, Y, Z). We say that (x,y,z) € K? is a point, or K-point, of @ 
if (x, y,z) # (0,0,0) and F(x, y,z) = 0, but we regard two triples (x, y, z) and 
(x*, y*, z*) as defining the same K -point if 


x* =dx,y* =Ay,z* =Az_ forsomenonzerod ec K. 


If @ is the projective completion of the affine plane curve @, then a point (x, y, z) 
of @ with z # 0 corresponds to a point (x/z, y/z) of @, and a point (x, y, 0) of C 
corresponds to a point at infinity of @. 

The K-point (x, y, z) of the projective plane curve defined by the homogeneous 
polynomial F(X, Y, Z) is said to be non-singular if there exist a,b,c € K, not all 
zero, such that 


Fixt+X,y+Y¥,z+Z)=aX+bY +cZ4--:-, 


where all unwritten terms have degree > 1. Since a, b,c are uniquely determined by 
F,, we can define the tangent to the projective curve at the non-singular point (x, y, z) 
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to be the projective line defined by aX + bY + cZ. It follows from Euler’s theorem on 
homogeneous functions that (x, y, z) is itself a point of the tangent. 

It is easily seen that if @ is the projective completion of an affine plane curve 
@, and if z # 0, then (x, y, z) is a non-singular point of @ if and only if (x/z, y/z) 
is a non-singular point of @. Moreover, if the tangent to © at (x, y,z) is the 
projective line 


&(X, Y, Z) =aX + bY +cZ, 
then the tangent to @ at (x/z, y/z) is the affine line defined by 
(X,Y) =aX+bY¥ +c. 


Let @ be an affine plane curve over the field K ,, defined by the polynomial f(X, Y), 
and let (x, y) be a non-singular K -point of @. Then we can write 


f@+X.y+¥) aX 4 bY + AY) 4$-, 


where a, b are not both zero, f2(X, Y) is a homogeneous polynomial of degree 2, and 
all unwritten terms have degree > 2. The non-singular point (x, y) is said to be an 
inflection point or, more simply, a flex of @ if f2(X, Y) is divisible by aX + bY. 

Similarly we can define a flex for a projective plane curve. Let (x, y, z) be a non- 
singular point of the projective plane curve over the field K, defined by the homoge- 
neous polynomial F(X, Y, Z). Then we can write 


Fax+X,y+Y,z+Z) =aX+bY¥ +cZ4+ Fy(X, Y,Z)+---, 


where a, b,c are not all zero, F2(X, Y, Z) is a homogeneous polynomial of degree 2, 
and all unwritten terms have degree > 2. The non-singular point (x, y, z) is said to be 
a flex if Fy(X, Y, Z) is divisible by aX + bY + cZ. 

Two more definitions are required before we embark on our study of cubic 
curves. A projective curve over the field K, defined by the homogeneous polynomial 
F(X, Y, Z) of degree d > 0, is said to be reducible over K if 


F(X, Y, Z) = Fi(X, Y, Z)Fx(X, Y, Z), 


where F; and F> are homogeneous polynomials of degree less than d with coefficients 
from K.The K -points of the curve defined by F are then just the K -points of the curve 
defined by F;, together with the K-points of the curve defined by F2. A curve is said 
to be irreducible over K if it is not reducible over K. 

Two projective curves over the field K, defined by the homogeneous polynomials 
F(X, Y, Z) and G(X’, Y’, Z’), are said to be projectively equivalent if there exists an 
invertible linear transformation 


X = aX’ +a2¥' +.13Z' 
Y = az X' +.anY! +.a93Z' 
Z = a31X' +.a32Y' + a33Z' 
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with coefficients aj; € K such that 
F(a X' +--+ ,aq1X! +-++ ,a3)X' +--+) = G(X", Y', Z’). 


It is clear that F and G necessarily have the same degree, and that projective equiva- 
lence is in fact an equivalence relation. 
Consider now the affine cubic curve @ defined by the polynomial 


f (X, Y) = a30X? + aa XY + aX Y* + ag3¥? + anoX? + a XY 
+ ao2¥? + ayoX +.ao1¥ + aoo. 


We assume that @ has a non-singular K-point which is a flex. Without loss of general- 
ity, suppose that this is the origin. Then ao9 = 0, aio and ag, are not both zero, and 


ano X* + ay X¥ + an2¥? = (aioX + ani Y)(ajyX +46) 


for some ajy, aj, € K. By an invertible linear change of variables we may suppose 
that aig = 0, ao; = 1. Then f has the form 


FS VISY + ak? Sey? = aX? aX = ak =a’. 


If ag = 0, then f is divisible by Y and the corresponding projective curve is reducible. 
Thus we now assume ag ¥ 0. In fact we may assume ag = 1, by replacing f by acon- 
stant multiple and then scaling Y. The projective completion @ of @ is now defined 
by the homogeneous polynomial 


VR" + axyvZ + ae¥Z = Hw XN? Y =a XV? = ac? 


If we interchange Y and Z, the flex becomes the unique point at infinity of the affine 
cubic curve defined by the polynomial 


Yo tak pa — Oe? Hak? ak we), 


This can be further simplified by making mild restrictions on the field K. If K has 
characteristic # 2, i.e. if 1 + 1 # 0, then by replacing Y by (Y — a, X — a3)/2 we 
obtain the cubic curve defined by the polynomial 


Y? = 4X? + by X* 4+ 2bgX + Be). 


If K also has characteristic # 3, i.e. if 1+ 1+ 1 ¥ O, then by replacing X by 
(X — 3b2)/6* and Y by 2Y/6°, we obtain the cubic curve defined by the polynomial 
Y? — (X? +.aX +b). Thus we have proved: 


Proposition 7 [f a projective cubic curve over the field K is irreducible and has a 
non-singular K -point which is a flex, then it is projectively equivalent to the projective 
completionW = W (a, ..., 46) of an affine curve of the form 


oY ae OR? Sah? ak we), 
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If K has characteristic # 2,3, then it is projectively equivalent to the projective 
completion @ = Gq,» of an affine curve of the form 


Y=? ax Dd). 


Itis easily seen that, conversely, for any choice of aj,...,a6 € K the curve Y, and 
in particular @,p, is irreducible over K and that 0, the unique point at infinity, is a flex. 
For any u,r,s,t € K with u 4 0, the invertible linear change of variables 
X=wX'+ r, 
Y=uwY'+su?X'+t 
replaces the curve W = W(a,,..., a6) by acurve W’ = W'(a),..., a6) of the same 
form. The numbering of the coefficients reflects the fact that if r = s = t = 0, then 
gaia, w= Tare a3 = ea; 


a= usa}, a = u®ag. 


In particular, for any nonzero u € K, the invertible linear change of variables 


X=uX’, 

Yow’ 
replaces Ga,p by Ga’, where 

a=u'd', 

b= ub’, 


By replacing X by x + X and Y by y + Y, we see that if a K-point (x, y) of Ga.p 
is singular, then 


3x7 +a =y =0, 
which implies 4a? + 27b* = 0. Thus the curve %,» has no singular points if 
4a? + 27b? #0. 
We will call 
d:= 4a? +27b’ 


the discriminant of the curve Gp. It is not difficult to verify that if the cubic polyno- 
mial X? + aX +b has roots e}, e2, €3, then 


d = —[(e1 — e2)(e1 — e3)(e2 — e3))°. 
If d = 0, a # O, then the polynomial X*? + aX + b has the repeated root 


x9 = —3b/2a and P = (xo, 0) is the unique singular point. If d = a = 0, then 
b = Oand P = (0, 0) is the unique singular point. 
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Singular cases 


y y 
P x . 
Node: d=0, a#0 Cusp:d=a=0 
Non-singular cases 
y y 
x x 
d<0 d>0 


€, pY2=+ax+b (abe Rid=4a +270) 


a 


Fig. 1. Cubic curves over R. 


The different types of curve which arise when K = R is the field of real numbers 
are illustrated in Figure 1. The unique point at infinity 0 may be thought of as being 
at both ends of the y-axis. (In the case of a node, Figure | illustrates the situation for 
xo > 0. For xo < 0 the singular point is an isolated point of the curve.) 


Suppose now that K is any field of characteristic 4 2, 3 and that the curve Gp has 
zero discriminant. Because of the geometrical interpretation when K = R, the unique 
singular point of the curve Gz,» is said to be a node if a # 0 and a cusp if a = 0. 
In the cusp case, if we put T = Y/X, then the cubic curve has the parametrization 
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X = T’, Y = T?. In the node case, if we put T = Y/(X + 3b/2a), then it has the 
parametrization 


X =T? +3b/a,Y =T? 4+ 9bT /2a. 


Thus in both cases the cubic curve is in fact elementary. 

We now restrict attention to non-singular cubic curves, i.e. curves which do not 
have a singular point. 

Two K-points of a projective cubic curve determine a projective line, which inter- 
sects the curve in a third K-point. This procedure for generating additional K -points 
was used implicitly by Diophantus and explicitly by Newton. There is also another 
procedure, which may be regarded as a limiting case: the tangent to a projective cubic 
curve at a K-point intersects the curve in another K-point. The combination of the 
two procedures is known as the ‘chord and tangent’ process. It will now be described 
analytically for the cubic curve Gp. 

If O is the unique point at infinity of the cubic curve @, and if P = (x, y) is any 
finite K -point, then the affine line determined by O and P is X — x and its other point 
of intersection with @,, is P* = (x, —y). 

Now let P, = (x1, y;) and P2 = (x2, y2) be any two finite K-points. If x; 4 x2, 
then the affine line determined by P; and P) is 


Y—mxX—-c, 


where 
m= (y2— y1)/(12 — x1), € = (1X2 — yox1)/(x2 — x1), 


and its third point of intersection with Gp is P3 = (x3, y3), where 


X3 =m —x, —x2, y3=mx3+C. 
If x; = x2, but yj # yo, then the affine line determined by P; and P> is X — x, and its 
other point of intersection with Gp is O. Finally, if P} = P2, it may be verified that 
the tangent to Gq,» at P is the affine line 


Y-—mxX —-c, 


where 
m= (3x7 +a)/2y1, c= (—x} + ax, + 2b)/2y1, 


and its other point of intersection with @,p is the point P3 = (x3, y3), where x3 and 
y3 are given by the same formulas as before, but with the new values of m and c (and 
with x2 = x1). 

It is rather remarkable that the K-points of a non-singular projective cubic curve 
can be given the structure of an abelian group. That this is possible is suggested by the 
addition theorem for elliptic functions. 

Suppose that K = C is the field of complex numbers and that the cubic curve is 
the projective completion @ of the affine curve 


Y? — g,(X), 
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where 
gi(X) = 44X? — 4(1 + A)X? +.4X 


is Riemann’s normal form and 2 ¥ 0, 1. If S(u) is the elliptic function defined in 

§3 of Chapter XII, then P(u) = (S(u), S’(u)) is a point of @ for any u € C. If we 

define the sum of P(u) and P(v) to be the point P(u + v), then the set of all C-points 

of @, becomes an abelian group, with P(0) = (0,0) as identity element and with 

P(—u) = (S(u), —S’(u)) as the inverse of P(u). In order to carry this construction 

over to the cubic curve @,, and to other fields than C, we interpret it geometrically. 
It was shown in (10) of Chapter XII that 


S(u +0) = 4S) S(v)[S(v) — SW)P/LS'(w)S(v) — S'W)S)P. 


The points (x1, y1) = (S(w), S’(u)) and (x2, y2) = (S(v), S’(v)) determine the affine 
line 


Y—mxX—-c, 
where 


m = [S'(v) — S’(u)]/[S(v) — S(u)], 
c = [S’(u)S(v) — S’(0)S(W)I/[S(0) — S@)]. 


The third point of intersection of this line with the cubic @, is the point (x3, y3), where 


x3= 7 /4Axyx2 
= [S'(u)S(v) — S’(v) SW)? /44S(u)S(0)[S(0) — S(u)P 
= 1/AS(u+d). 


On the other hand, the points (0,0) = (S(O), S’(O)) and (x3, y3) = (S(u + v), 
S’(u + v)) determine the affine line Y — (y}/x})X and its third point of intersec- 
tion with G@ is the point (x4, ya), where x4 = 1/ AX3 = x3. Evidently ri — 2, and it 
may be verified that actually y4 = y3. Thus (x3, y3) is the third point of intersection 
with G, of the line determined by the points (0, 0) and (x3, y3). 

The origin (0,0) may not be a point of the cubic curve @,, but O, the point at 
infinity, certainly is. Consequently, as illustrated in Figure 2, we now define the sum 
P, + P2 of two K-points P|, P2 of Gq,» to be the K-point P;, where P3 is the third 
point of %,, on the line determined by P;, Pz and P; is the third point of @,, on the 
line determined by O, P3. If P; = P2, the line determined by P;, P2 is understood to 
mean the tangent to Gp at Py. 

It is simply a matter of elementary algebra to deduce from the formulas previously 
given that, if addition is defined in this way, the set of all K-points of Gq,» becomes 
an abelian group, with O as identity element and with —P = (x, —y) as the inverse of 
P = (x, y). Since —P = P if and only if y = 0, the elements of order 2 in this group 
are the points (xo, 0), where xo is a root of the polynomial X? + aX + b (if it has any 
roots in K). 

Throughout the preceding discussion of cubic curves we restricted attention to 
those with a flex. It will now be shown that in a sense this is no restriction. 
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Fig. 2. Addition on © p. 


Let @ be a projective cubic curve over the field K, defined by the homogeneous 
polynomial F\(X, Y, Z), and suppose that @ has a non-singular K-point P. Without 
loss of generality we assume that P = (1, 0, 0) and that the tangent at P is the projec- 
tive line Z. Then F; has no term in X? or in XY: 


Fi(X, Y, Z) =a? + bY°Z +0¥Z" +dZ° +eN'Z + gXY* FAXZ. 


Here e ¥ 0, since P is non-singular, and we may suppose g ¥ 0, since otherwise P is 
a flex. If we replace gX + aY by X, this assumes the form 


Fy(X, Y, Z) = XY* + bY’Z 4+ cY¥Z? +dZ> +eX’Z+gXYZ+hXZ’, 


with new values for the coefficients. If we now replace X + bZ by X, this assumes the 
form 


F3(X, Y, Z) = XY*4+cYZ? +dZ> +eX?Z+ gXYZ+hXZ’, 


again with new values for the coefficients. The projective cubic curve Y over the field 
K, defined by the homogeneous polynomial 


Fi(U, V, W) = VW" +cV°W + dUV? + eU? + gUVW +AU? V,7 
has a flex at the point (0, 0, 1). Moreover, 


F3(U?, VW, UV) = U°VF4(U, V, W), 
F4(XZ, Z*, XY) = XZ? F3(X, Y, Z). 


This shows that any projective cubic curve over the field K with a non-singular K-point 
is birationally equivalent to one with a flex. 
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Birational equivalence may be defined in the following way. A rational transfor- 
mation of the projective plane with points X = (X1, X2, X3)isamapX > Y = g(X), 
where 


9(X) = (1 (X), 92(X), 93(X)) 


and 1, 92,93 are homogeneous polynomials without common factor of the same 
degree m, say. (In the corresponding affine plane the coordinates are transformed by 
rational functions.) The transformation is birational if there exists an inverse map 
Y—> X= w(Y), where 


W(Y) = (iY), yr), w3(V)) 


and wi, v2, y3 are homogeneous polynomials without common factor of the same 
degree n, say, such that 


wig(X)] = o(X)X, ely] =aMy 


for some scalar polynomials w(X), 0(Y). Two irreducible projective plane curves @ 
and J over the field K, defined respectively by the homogeneous polynomials F(X) 
and G(Y) (not necessarily of the same degree), are birationally equivalent if there 
exists a birational transformation Y = g(X) with inverse X = y(Y) such that G[g(X)] 
is divisible by F(X) and F[y(Y)] is divisible by G[(Y)]. 

It is clear that birational equivalence is indeed an equivalence relation, and that 
irreducible projective curves which are projectively equivalent are also birationally 
equivalent. Birational transformations are often used to simplify the singular points 
of a curve. Indeed the theorem on resolution of singularities says that any irreducible 
curve is birationally equivalent to a non-singular curve, although it may be a curve in 
a higher-dimensional space rather than in the plane. The algebraic geometry of curves 
may be regarded as the study of those properties which are invariant under birational 
equivalence. 

It was shown by Poincaré (1901) that any non-singular curve of genus 1 defined 
over the field Q of rational numbers and with at least one rational point is birationally 
equivalent over Q to a cubic curve. Such a curve is now said to be an elliptic curve 
(for the somewhat inadequate reason that it may be parametrized by elliptic functions 
over the field of complex numbers.) However, for our purposes it is sufficient to define 
an elliptic curve to be a non-singular cubic curve of the form WY, over a field K of 
arbitrary characteristic, or of the form @,,, over a field K of characteristic 4 2, 3. 


4 Mordell’s Theorem 


We showed in the previous section that, for any field K of characteristic A 2, 3, the 
K-points of the elliptic curve Gq,» defined by the polynomial 


ak Sax =s: 


where a,b € K andd := 4a’ + 27b* + 0, form an abelian group, E(K) say. We 
now restrict our attention to the case when K = Q is the field of rational numbers, and 
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we write simply E := E(Q). This section is devoted to the basic theorem of Mordell 
(1922), which says that the abelian group E is finitely generated. 

By replacing X by X/c? and Y by Y/c? for some nonzero c € Q, we may (and 
will) assume that a and b are both integers. Let P = (x, y) be any finite rational point 
of Ga,» and write x = p/q, where p and q are coprime integers. The height h(P) of 
P is uniquely defined by 


h(P) = log max(|p|, |q]). 
We also set h(O) = 0, where O is the unique point at infinity of @,p. 

Evidently h(P) > 0. Furthermore, h(—P) = h(P), since P = (x, y) implies 
—P = (x,-y). Also, for any r > 0, there exist only finitely many elements 
P = (x, y) of E with h(P) <r, since x determines y up to sign. 

Proposition 8 There exists a constant C = C(a, b) > 0 such that 
|A(2P)-—4h(P)| <C forallP EE. 

Proof By the formulas given in §3, if P = (x, y), then 2P = (x’, y’), where 

x =m —2x, m= (Bx? +a)/2y. 
Since ye =x?+ax +), it follows that 

x! = (x4 — 2ax? — 8bx + a’) /4(x? +ax+b). 

If x = p/q, where p and q are coprime integers, then x’ = p’/q’, where 

p’ = p* — 2ap’q? — 8bpq? +.a°q", 

q' = 4q(p* + apq” + bq”). 
Evidently p’ and q’ are also integers, but they need not be coprime. However, since 

p'=ep", q'=eq", 
where e, p”, q” are integers and p”, g” are coprime, we have 
h(2P) = log max(|p"|, |q"|) < log max(|p’|, |q'l). 

Since 

max(Ip'l, Iq’l) < max({pl, |gl)* max{1 + 2lal + 81b| +2, 4(1 + lal + IoD}, 
it follows that 

h(2P) < 4h(P)+C’ 


for some constant C’ = C’(a, b) > 0. 


560 XII Connections with Number Theory 
The Euclidean algorithm may be used to derive the polynomial identity 
(3X? + 4a)(X* — 2aX? — 8bX +a”) — (3X? — 5aX — 27b)(X? +. aX +b) =d, 
where once again d = 4a? + 27b?. Substituting p/q for X, we obtain 
4dq’ = 4(3p"q + 4aq°)p' — Bp? — Sapq? — 27bq")4q. 
Similarly, the Euclidean algorithm may be used to derive the polynomial identity 
fOOG =—2aX* = Spx? + a7 X* + o(X)X 0 + aX? + bX?) Ha, 
where 


f (X) = 4a? + 27b? — a7bX + a(3a? + 22b*) X* + 3b(a? + 8b) X?, 
g(X) = a7b + a(Sa? + 32b7)X + 2b(13a? + 96b7)X? — 3a? (a? + 8b) X?. 


Substituting q/p for X, we obtain 


Adp! = 4{(4a? + 27b”) p> — a*bp*q + (at + 22ab7) pq? + 3(a>b + 8b>)q?} p' 
+{a*bp?+ (5a*+ 32ab7) p2g+ (26a*b + 192b3) pq?— 3(a>+ 8a*b7)q7}q'. 


Since d ¥ 0, it follows from these two relations that 
max(| pl, |qI)’ < Ci max(|p|, |g|)° max(|p'l, 191) 
and hence 


max(|pl, |q|)* < Cy max(|p'l, Iq’l). 


But the two relations also show that the greatest common divisor e of p’ and q’ divides 
both 4dq’ and 4dp’, and hence also 4d, since p and q are coprime. Consequently 


max(|p'|, |q’l) < 4Id| max(|p"|, |q’1). 
Combining this with the previous inequality, we obtain 
4h(P) < h(2P)+C"” 


for some constant C” = C” (a, b) > 0. 
This proves the result, with C = max(C’, C”). 


Proposition 9 There exists a unique function h: E > R such that 


(i) h —h is bounded, 
(ii) A(2P) = 4h(P) for every P € E. 


Furthermore, it is given by the formula h(P) = liMyp— 00 h(2” P)/4”. 
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Proof Suppose h has the properties (i),(ii). Then, by (ii), 4"h(P) = h(2"P) and 
hence, by (i), 4"h(P) —h(2" P) is bounded. Dividing by 4”, we see that h(2” P)/4” > 
h(P) asin — oo. This proves uniqueness. 

To prove existence, choose C as in the statement of Proposition 8. Then, for any 
integers m,n withn > m > 0, 


n-1 
S47 hit! P) — 4a! P)) 


j=m 


|4-"h(2" P) —4-"hA(2" P)| = 


n—-1 

< >) 47" n/t" P) — 4h(2! P)| 
j=m 
n—-1 

ay 4 C24" 6/3: 


j=m 


Thus the sequence {4~"h(2”" P)} is a fundamental sequence and consequently conver- 
gent. If we denote its limit by h(P), then clearly A(2 P)= 4h(P). On the other hand, 
by taking m = 0 and letting n — oo in the preceding inequality we obtain 


|A(P) — h(P)| < C/3. 


Thus f has both the required properties. 


The value h( P) is called the canonical height of the rational point P. The formula 
for h(P) shows that, for all P € E, 


h(—P) = h(P) > 0. 


Moreover, by Proposition 9(i), for any r > O there exist only finitely many elements 
P of E with h(P) <r. 
It will now be shown that the canonical height satisfies the parallelogram law: 


Proposition 10 For all P,, P2 € E, 
A(P\ + Po) + h(P1 — Po) = 2A(P1) + 2h( Pr). 


Proof It is sufficient to show that there exists a constant C’ > O such that, for all 
Pi, P,€ E, 


h(P, + Po) +h(P, — P2) < 2h(P1) + 2h(P2) + C’. (*) 
For it then follows from the formula in Proposition 9 that, for all P,, Po € E, 
h(P) + P2) + h(P1 — P2) < 2h(P1) + 2h( Pa). 
But, replacing P; by P; + P2 and Pz by P; — Po, we also have 
h(2P1) + h(2P2) < 2h(Pi + Po) + 2h(P, — P2) 
and hence, by Proposition 9(ii), 


2h(P1) + 2h(P2) < h(P) + Pr) + h(Pi — Po). 
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To prove (*) we may evidently assume that P; = (x1, y,) and Po = (x2, y2) are 
both finite. Moreover, by Proposition 8, we may assume that Pj # P>. Then, by the 
formulas of §3, 


Pi + Py = (x3, 93), Pi — Po = (x4, ya), 


where 
x3 = (2 — y1)?/(@x2 — x1)? — (1 +2), 
x4 = (92 + yi)? /(x2 — x1)? — (1 +22). 
Hence 
x3 + x4 = 2Lys + yp — (2 — x1) (3 — x7)I/ (42 — 21)? 
and 


x3x4 = (¥3 — yp)?/(x2 — x1)* — 21 + x2)(9f + ¥9)/ C2 — 1)? + 1 + x2)”. 
Since y; = x} +ax; +b (j = 1,2), these relations simplify to 
x3 + x4 = 2[xix9(e1 + x2) + a(ai + x2) + 2b)/ (2 — x1)? 
and 
x3x4 = N/(x2 - x1)’, 
where 


N = (x3 + x1x2 + x7 +a)? — 2x] + x2)? (XZ — xyx2 + xT +.) 
— 4b(x, +.x2) + (x5 = ne 
= (x1x2 — a)? — 4b(x1 + x2). 


Put x; = p;/qj, where (pj,qj) =1(1 < j < 4). Then x3, x4 are the roots of the 
quadratic polynomial 


AX? +BX+C 
with integer coefficients 


A = (poqi — Piqr)’, 

B = (pip2 +.4q192)(pig2 + poqi) + 2bq7q3, 

C = (pip2 — 49192)" — 4bq192(P1q2 + P241). 
Consequently 


Ap3p4 = Cq344, 
A(p3q4 + paq3) = Bq3qa. 
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By Proposition II.16, g3 and gq each divide A, and so their product divides A?. 
Hence, for some integer D ¥ 0, 


A? = Dq3q4,_ AC = Dp3ps,_ AB = D(p3qa + pqs). 


But it is easily seen that q3q4, p3 pa and p3g4 + paq3 have no common prime divisor. 
It follows that A divides D. 
Hence, if we put 


pj =max(|pjl|, gil) <j < 4), 


then 
lasqal < |Al < 4p7 73, 
Ipspal < IC] < [0 + lal)? + 81bl]p7p5, 
|p3qa + pagal < |Bl < 201 + lal + |b) p73. 
But 


max(|p3|, |g3|) max(| pal, |ga]) < max(|p3 pal, |g3gal + |p3g4 + pagal), 
since if |q3| < |p3| and | pa| < |qa|, for example, then 
|p3q4| < |pag3| + |p3g4 + pagal < |g3qal + |p3qg4 + pagal. 


It follows that there exists a constant C” > 0 such that 


p3pa < C" pi p3, 


which is equivalent to (*) with C’ = log C”. 
Corollary 11 For any P € E and any integer n, 
h(nP) = nwh(P). 


Proof Since h(—P) = h(P), we may assume 1 > 0. We may actually assume n > 2, 
since the result is trivial for n = | and it holds for n = 2 by Proposition 9. By Propo- 
sition 10 we have 


A(nP) + h((n —2)P) = 2h((n — 1)P) + 2A(P), 


from which the general case follows by induction. 


It follows from Corollary 11 that if an element P of the group E has finite order, 
then h(P) = 0. The converse is also true. In fact, by Proposition 10, the set of all 
P é€ E such that h(P) = 0 is a subgroup of E, and this subgroup is finite since there 
are only finitely many points P such that h(P) <i. 

We now deduce from Proposition 10 that a non-negative quadratic form can be 
constructed from the canonical height. If we put 


(P,Q) =h(P + Q) —h(P) —h(Q), 
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then evidently 
(P,Q) = (Q, P), (P, P) = 2h(P) > 0. 
It remains to show that 
(P,Q +R) = (P, Q) + (P, R), 
and we do this by proving that 
h(P+Q+R)=h(P+Q)+h(P +R) +h(Q+ R)—h(P) —h(Q) — A(R). 
But, by the parallelogram law, 
A(P+Q+R+P)+h(Q+R)=A(P+QO+R+P)+h(P+Q+R-P) 
= 2h(P + Q+ R) + 2h(P) 
and 
A(P+QO+R+P)+h(Q—R)=h(P+O+R+P)+h(P+O-P-—R) 
= 2h(P + Q) + 2h(P + R). 
Subtracting the second relation from the first, we obtain 
h(Q + R) —h(Q — R) = 2h(P + O+ R) + 2h(P) — 2h(P + QO) — 2h(P + R). 
Since, by the parallelogram law again, 
h(Q + R) + h(Q — R) = 2h(Q) + 2h(R), 
this is equivalent to what we wished to prove. 


Proposition 12 The abelian group E is finitely generated if, for some integer m > 1, 
the factor group E/mE is finite. 


Proof Let S be a set of representatives of the cosets of the subgroup mE. Since S is 
finite, by hypothesis, we can choose C > 0 so that h(Q) < C forall Q € S. The set 


S'={Q' ec E:h(Q)<C} 


contains S and is also finite. We will show that it generates E. 
Let E’ be the subgroup of E generated by the elements of S’. If E’ # E, 
choose P € E\E’ so that h(P) is minimal. Then 


P=mP,+Q,_ forsome P; € E andQ; eS. 
Since 

h(P + Q1) + h(P — Q1) = 2h(P) + 2h(Q1), 
it follows that 


h(mP,) = h(P — Q)) < 2h(P) +2C 
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and hence 
A(P1) < 2[A(P) + C]/m? < [A(P) + C]/2. 


But P; ¢ E’, since P ¢ E’, and hence h(P,) > h(P). It follows that h(P) < C, which 
is a contradiction. Hence E’ = E. 


Proposition 12 shows that to complete the proof of Mordell’s theorem it is enough 
to show that the factor group E/2E is finite. We will prove this only for the case when 
E contains an element of order 2. A similar proof may be given for the general case, 
but it requires some knowledge of algebraic number theory. 

The assumption that E' contains an element of order 2 means that there is a rational 
point (xo, 0), where xo is a root of the polynomial X*+aX +b. Since a and b are taken 
to be integers, and the polynomial has highest coefficient 1, x9 must also be an integer. 
By changing variable from X to x9 + X, we replace the cubic @» by a cubic Cap 
defined by a polynomial 


y= (2? + Ax? By): 
where A, B € Z. The non-singularity condition d := 4a? + 27b* 4 0 becomes 
D := B’(4B — A’) £0, 


but this is the only restriction on A, B. The chord joining two rational points of C4_z 
is given by the same formulas as for @,, in 83, but the tangent to C4, z at the finite 
point P; = (x1, y;) is now the affine line 


Y—mxX —-c, 
where 
m= Gx? +2Ax,+ B)/2y1, c= —x1 (x? — B)/2y}. 


The geometrical interpretation of the group law remains the same as before. We will 
now denote by E the group of all rational points of C4,g. Our change of variable has 
made the point N = (0, 0) an element of E of order 2. 

Let P = (x, y) be a rational point of C4,g with x 4 0. We are going to show that, 
in a sense which will become clear, there are only finitely many rational square classes 
to which x can belong. 

Write x = m/n, y = p/q, where m,n, p,q are integers with n,q > O and 
(m,n) = (p,q) = 1. Then 


pon? = (m> + Am?n + Bmn?)q’, 


which implies both g?|n? and n3|q?. Thus n? = q*. From n?|q? we obtain n|q. Hence 
q = en for some integer e, and it follows that n = e*, gq = e?. Thus 


x=m/e*, y=p/e?, wheree>0O and (m,e)=(p,e)=1. 
Moreover, 


p- = m(m + Ame? + Be*). 
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This shows that each prime which divides m, but not m2 + Ame? + Be*," must occur 
to an even power in m. On the other hand, each prime which divides both m and 
m? + Ame” + Be* must also divide B, since (m, e) = 1. Consequently we can write 


x =+tpj'--- pk (u/e)’, 


where u € N, pj,..., px are the distinct primes dividing B and ¢; € {0,1} 
(1 < j < k). Hence there are at most 2k+1 rational square classes to which x can 
belong. 

Suppose now that P; = (x1, y;) and Po = (x2, y2) are distinct rational points of 
Ca,p for which x;x2 is a nonzero rational square, and let P3 = (x3, y3) be the third 
point of intersection with C4 of the line through P; and P2. Then x1, x2, x3 are the 
three roots of a cubic equation 


(mX +c)? = X> + AX? + BX. 


From the constant term we see that xjx2x3 = c2. It follows that x3 is a nonzero 
rational square if c ~ 0. If c = 0, then P3 = N and x1x2 = B. 

Suppose next that P = (x, y) is any rational point of C4,g with x # 0, and let 
2P = (x,—y). Then P = (<x, y) is the other point of intersection with C4, of the 
tangent to C4_p at P. By the same argumentas before, xx = c*. Hence x is anonzero 
rational square if c 4 0. If c = 0, then 2P = N and x? = B. 

To deduce that E/2E is finite from these observations we will use an arithmetic 
analogue of Landen’s transformation. We saw in Chapter XII that, over the field C of 
complex numbers, the cubic curve @ defined by the polynomial Y? — g,(X), where 
gi(X) = 4X(1 — X) (1 — 1X), admits the parametrization 


X =S(u,d), Y =S'(u, A). 


It follows from Proposition XII.11 that the cubic curve @j, where 2’ is given by 
AV = 17/11 4+ 1 —A)!/7]5, admits the parametrization 


Yale 4y7 71x = 2/0 = 1, 
Y=f1l4 (=) 770-27 41 1/0 Ay, 
where again X = S(u,A), Y = S'(u, A) and where (1 — 2X + AX?)/(1 — 1X)? is 
the derivative with respect to X of X(1 — X)/(1 — 2X). Since also X’ = S(u’, 2’), 
where uw’ = [1 + (1 — d)!/]u, the map (X, Y) > (X’, Y’) defines a homomorphism 
of the group of complex points of @ into the group of complex points of Gy. 
We will simply state analogous results for the cubic curve C4, over the field Q of 


rational numbers, since their verification is elementary. If (x, y) is a rational point of 
Ca,p with x ~ 0 and if 


x’ = (x7 +Ax+B)/x, y= y(x* — B)/x’, 
then (x’, y’) is a rational point of C4’, where 


A'’=-2A, B’=A*—4B. 
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Moreover, if we define a map g of the group E of all rational points of C4, into the 
group E’ of all rational points of C4’, g’ by putting 


p(x.y)=(',y’) ifx 40, g(N) = 90) =O, 
then g is ahomomorphism, i.e. 
g(P + Q)=9(P)+ (2), o(—P) =—9(P). 
The range g(£) may not be the whole of E’. In fact, since 
x! = (x? + Ax? + Bx)/x? = (y/x)’, 


the first coordinate of any finite point of g(E) must be a rational square. Furthermore, 
if N = (0, 0) is a point of g(£), the integer B’ = A* — 4B must be a square. We will 
show that these conditions completely characterize y(E). 

Evidently if A* — 4B is a square, then the quadratic polynomial X* + AX + B has 
a rational root x9 4 0 and (xo, 0) = N. Suppose now that (x’, y’) is a rational point 
of C4’, gv and that x’ = t? is a nonzero rational square. We will show that if 


xp=(P—A+y'/t)/2, yr =tx1, 
x2 =(t? —A-—y’/t)/2,. y2 = —tx2, 


then (xj, yj) € E and g(x;, yj) = (’,y’) G = 1,2). It is easily seen that 
(xj, yj) € E if and only if 


2 = Xj + A+ B/x;. 


But 
xyx2 = [(? — A)? — y?/17]/4 
= [(x’ — A)? — y?/x']/4 
= (x3 —2Ax” + A2x! — y?)/4x’, 
Since 


y? = x _ 2Ax” +4 (a> _ 4B)x', 
it follows that xjx2 = B. Hence (x1, y;) and (x2, y2) are both in E if 2 =x, +A+x, 
and this condition is certainly satisfied by the definitions of x; and x2. 
In addition to 
xj+At B/xj =1? =x' (j = 1,2), 


we have 


yi(x? — B)/x? = t(x? — x1x2)/x1 = t(x1 — 2) = y’, 
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and similarly y2(x3 — B)/xz = y’. It follows that 
9(x1, ¥1) = (x2, y2) = (', y’). 


Since ¢ is a homomorphism, the range g(£) is a subgroup of E’. We are going to 
show that this subgroup is of finite index in E’. By what we have already proved for 
E, there exists a finite (or empty) set P| = (xj, y}),--.. Pi = (x{, y) of points of E’ 
such that x/ is not a rational square (1 < i < s) and such that, if P’ = (x’, y’) is any 
other point of E’ for which x’ is not a rational square, then x’x' is a nonzero rational 
square for a unique j € {1,..., 5}. Let P” = (x”, y”) be the third point of intersection 
with C4’, g7 of the line through P’ and Pi, so that 


Pip Pop Pe =O, 


By what we have already proved, either x” is a nonzero rational square or P” = N and 
x’x!, = B’ is asquare. In either case, P” € g(£). Furthermore, if 2P; = (x, —y), then 
either x is a nonzero rational square or 2P; = N and x? = B’. In either case again, 
2P; € 9(£). Since 


P’ = Pi} —(2P/+ P"), 


it follows that P’ and P. are in the same coset of g(£). Consequently P;,..., Pi, 
together with O, and also N if B’ is not a square, form a complete set of representa- 
tives of the cosets of g(E) in E’. 

The preceding discussion can be repeated with C4’ py in the place of C4 g. It yields 
a homomorphism g’ of the group E’ of all rational points of C4’, g/ into the group E” 


of all rational points of C4” 3”, where 
A" =—2A'=4A, BY” = A? — 4B’ = 16B. 


But the simple transformation (X,Y) — (X/4, Y/8) replaces C43” by Ca,p and 
defines an isomorphism y of E” with E. Hence the composite map y = xy og’ isa 
homomorphism of E’ into E, and y og is ahomomorphism of E into itself. 

We now show that the homomorphism P > yw o g(P) is just the doubling map 
P > 2P. Since this is obvious if P = O or N, we need only verify it for P = (x, y) 
with x 4 0. 

For P” = g' 0 g(P) we have 


x" = (y'/x'? = [y(. — B/x*) -x?/y°P = (xe? — BY*/y? 
and 
y" = y'(1— B’/x”) = y(1 — B/x*)[1 — (A? — 4B)x4/y*] 


= (x? — B)[y* — (A? — 4B)x*]/x?y? 
= (x? — B)[(x? + Ax + BY’ — (A? — 4B)x7]/y?. 


5 Further Results and Conjectures 569 
Hence for y 0 g(P) = P* = (x*, y*) we have 


x* = (x? — B)’/Ay’, 
y* = (x* — B)[(x? + Ax + B)? — (A? — 4B)x?]/8y?. 


On the other hand, if the tangent to C4 z at P intersects C4, pz again at (x, y), then 
2P = (x, —y). The cubic equation 


(mx +c)? = X? + AX? + BX 


has x as a double root and X as its third root. Hence ¥ = (c/x)*. Using the formula for 
c given previously, we obtain 


¥ = (x? — B)?/4y* = x". 
Furthermore, using the formula for m given previously, 


J = mx +c = [(Bx? + 2Ax + B)X — x(x? — B)]/2y 
= (x* — B)[(3x? + 2Ax + B)(x? — B) — 4xy’)]/8y°. 


Substituting x? + Ax? + Bx for y?, we obtain ¥ = —y*. Thus y 0 g(P) = 2P, as 
claimed. 

Since g(£) has finite index in E’, and likewise y(E’) has finite index in E, it fol- 
lows that 2E = wy o @(E) has finite index in E. (The proof shows that the index is 
at most 2*+4+?, where a is the number of distinct prime divisors of B and f is the 
number of distinct prime divisors of A* — 4B.) 

By the remarks after the proof of Proposition 12, Mordell’s theorem has now been 
completely proved in the case where E contains an element of order 2. 
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Let Gq,» be the elliptic curve defined by the polynomial 
Y= ( +ax +5), 


where a,b € Z andd := 4a* + 27b* ¥ 0. By Mordell’s theorem, the abelian group 
E = E,.»(Q) of all rational points of @.p is finitely generated. It follows from the 
structure theorem for finitely generated abelian groups (Chapter III, §4) that E is 
the direct sum of a finite abelian group E’ and a ‘free’ abelian group E/, which is the 
direct sum of r > 0 infinite cyclic subgroups. The non-negative integer r is called 
the rank of the elliptic curve and E' its torsion group. 

The torsion group can, in principle, be determined by a finite amount of computa- 
tion. A theorem of Nagell (1935) and Lutz (1937) says that if P = (x, y) is a point of 
E of finite order, then x and y are integers and either y = 0 or y* divides d. Thus there 
are only finitely many possibilities to check. 
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A deep theorem of Mazur (1977) says that the torsion group must be one of the 
following: 


(i) acyclic group of ordern (1 <n < 10 0rn = 12), 
(ii) the direct sum of a cyclic group of order 2 and a cyclic group of order 
2n (1 <n <A). 


It was already known that each of these possibilities occurs. It is easy to check if the 
torsion group is of type (i) or type (11), since in the latter case there are three elements 
of order 2, whereas in the former case there is at most one. Mazur’s result shows that 
an element has infinite order, if it does not have order < 12. 

It is conjectured that there exist elliptic curves over Q with arbitrarily large rank. 
(Examples are known of elliptic curves with rank > 22.) At present no infallible algo- 
rithm is known for determining the rank of an elliptic curve, let alone a basis for the 
torsion-free group E/. However, Manin (1971) devised a conditional algorithm, based 
on the strong conjecture of Birch and Swinnerton-Dyer which will be mentioned later. 
This conjecture is still unproved, but is supported by much numerical evidence. 

An important way of obtaining arithmetic information about an elliptic curve is by 
reduction modulo a prime p. We regard the coefficients not as integers, but as integers 
mod p, and we look not for Q-points, but for F,-points. Since the normal form ©,» 
was obtained by assuming that the field had characteristic £ 2, 3, we now adopt a more 
general normal form. 

Let YW = W(aj,..., 46) be the projective completion of the affine cubic curve 
defined by the polynomial 


¥? aX a — (> + aod? + ak + ae), 


where aj € Q (j = 1, 2,3, 4, 6). It may be shown that Y is non-singular if and only 
if the discriminant A 4 0, where 


A= — b3bg — 8b} — 27bz + 9bobabe 


and 
by = at + 4a2, 
b4 = aja3 + 2a4, 
bo = a3 + 4ae, 
bg = ara6 — aja3a4 + 4a2d6 + anay - dj. 
(We retain the name ‘discriminant’, although 4 = —16d for W = Gp.) The defini- 


tion of addition on Y has the same geometrical interpretation as on @,p, although the 
corresponding algebraic formulas are different. They are written out in §7. 
For any u,r,s,t € Q with u ¥ 0, the invertible linear change of variables 


X=wX' +r, Y=uY'+su?X'+t 


replaces W by acurve W’ of the same form with discriminant A’ = u~!? 4. By means 
of such a transformation we may assume that the coefficients a; are integers and that J, 
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which is now an integer, has minimal absolute value. (It has been proved by Tate that 
we then have | A| > 1.) The discussion which follows presupposes that Y is chosen in 
this way so that, in particular, discriminant means ‘minimal discriminant’. We say that 
such a W is a minimal model for the elliptic curve. 

For any prime p, let Y, be the cubic curve defined over the finite field F,, by the 
polynomial 


¥* 4 aiXY +aaY — 4 bx? + ak + G,), 
where a; € aj + pZ. If p{A the cubic curve Y, is non-singular, but if p| 4 then Y, 


has a unique singular point. The singular point (xo, yo) of Wp is a cusp if, on replacing 
X and Y by x9 + X and yo + Y, we obtain a polynomial of the form 


GX A DY Hee, 


where a, b,c € F, and the unwritten terms are of degree > 2. Otherwise, the singular 
point is a node. 

For any prime p, let N, denote the number of F ,-points of Y,, including the point 
at infinity O, and put 


Cp =pt+1—Ny. 
It was conjectured by Artin (1924), and proved by Hasse (1934), that 
Icp| < 2p'/? if pt. 


1/2 


Since 2p’/~ is not an integer, this inequality says that the quadratic polynomial 


1—cpT + pT’ 


has conjugate complex roots yp, yp of absolute value pee 


that the zeros of 


or, if we put T = p™, 


1—cpp +4 po 


lie on the line Zs = 1/2. Thus it is an analogue of the Riemann hypothesis on the zeros 
of ¢(s), but differs from it by having been proved. (As mentioned in §5 of Chapter IX, 
Hasse’s result was considerably generalized by Weil (1948) and Deligne (1974).) 

The L-function of the original elliptic curve W is defined by 


L(s) = L(s, W) = [a =ep yr" []a = cpp? + ph y1, 
p\A pt 


The first product on the right side has only finitely many factors. The infinite second 
product is convergent for Zs > 3/2, since 


2s 1/2—s 1/2-s prey 


L-cpp +p = (p78 = ply, (pt? = plz, 
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and |ypl = |ypl = p~'/?, Multiplying out the products, we obtain for Zs > 3/2 
an absolutely convergent Dirichlet series 


L(s) = > Cyn * 


n>1 


with integer coefficients c,. (If = p is prime, then cy is the previously defined c,.) 
The conductor N = N(W%) of the elliptic curve Y is defined by the singular 
reductions Y, of W: 


N= 1] pl, 


p\A 


where f, = 1 if Y, has a node, whereas f, = 2 if p > 3 and Y, has a cusp. We 
will not define f, if p € {2,3} and Y, has a cusp, but we mention that f,, is then an 
integer > 2 which can be calculated by an algorithm due to Tate (1975). (It may be 
shown that fo < 8 and f3 < 5.) 

The elliptic curve Y is said to be semi-stable if Y, has a node for every p| 4. Thus, 
for a semi-stable elliptic curve, the conductor N is precisely the product of the distinct 
primes dividing the discriminant 4. (The semi-stable case is the only one in which the 
conductor is square-free.) 

Three important conjectures about elliptic curves, involving their L-functions and 
conductors, will now be described. 

It was conjectured by Hasse (1954) that the function 


C(s, 4) := C(s)e(s — D/L, W) 


may be analytically continued to a function which is meromorphic in the whole 
complex plane and that ¢(2 — s,W) is connected with ¢(s,W) by a functional 
equation similar to that satisfied by the Riemann zeta-function ¢(s). In terms of 
L-functions, Hasse’s conjecture was given the following precise form by Weil (1967): 


HW-Conjecture: If the elliptic curve W has L-function L(s) and conductor N, then 
L(s) may be analytically continued, so that the function 


A(s) = Qn) °I(s)L(s), 


where I'(s) denotes Euler’s gamma-function, is holomorphic throughout the whole 
complex plane and satisfies the functional equation 


A(s) = +N! A(2—s). 
(In fact it is the functional equation which determines the precise definition of the 
conductor.) 


The second conjecture, due to Birch and Swinnerton-Dyer (1965), connects the 
L-function with the group of rational points: 


BSD-Conjecture: The L-function L(s) of the elliptic curve W has a zero at s = | of 


order exactly equal to the rank r > 0 of the group E = E(WY, Q) ofall rational points 
of Ww. 
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This is sometimes called the ‘weak’ conjecture of Birch and Swinnerton-Dyer, 
since they also gave a ‘strong’ version, in which the nonzero constant C such that 
L(s)~ C(s—1)’ fors> 1 


is expressed by other arithmetic invariants of W%. The strong conjecture may be 
regarded as an analogue for elliptic curves of a known formula for the Dedekind zeta- 
function of an algebraic number field. An interesting reformulation of the strong form 
has been given by Bloch (1980). 

The statement of the third conjecture requires some preparation. For any positive 
integer N, let J9(N) denote the multiplicative group of all matrices 


a b 
a= (0 a): 


where a, b, c, d are integers such that ad — bc = 1 andc = Omod N. A function f(z) 
which is holomorphic for t € #@ (the upper half-plane) is said to be a modular form 
of weight 2 for Io(N) if, for every such A, 


f((at + b)/(ct +d)) = (ct +d)’ f(z). 
An elliptic curve Y, with L-function 
L(s) = Yan 
n>1 
and conductor JN, is said to be modular if the function 
fr) = Sie", 
n>1 


which is certainly holomorphic in .#, is a modular form of weight 2 for J(V). This 
actually implies that f is a ‘cusp form’ and satisfies a functional equation 


f(-1/Nt) = =N?t’ f(z). 


It follows that the Mellin transform 


™ 1 

a) = Fayyy"ay 
may be analytically continued for all s € C and satisfies the functional equation 

A(s) = +N! A(2 5). 
(Note the reversal of sign.) But 

A(s) = 2a) T'(s)L(8), 
since, by (9) of Chapter IX, 

[0.0] 
: gene dy = Ona) "1 (s). 
0 

Hence any modular elliptic curve satisfies the HW-conjecture. 


It was shown by Weil (1967) that, conversely, an elliptic curve is modular if 
not only its L-function L(s) = >',5;¢nn~* has the properties required in the 
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HW-conjecture but also, for sufficiently many Dirichlet characters y, the ‘twisted’ 
L-functions 


L(s, x) = DY xaJenn~ 


n>1 


have analogous properties. 

The definition of modular elliptic curve can be given a more intuitive form: the 
elliptic curve @j,, is modular if there exist non-constant functions X = f(t), Y = 
g(t) which are holomorphic in the upper half-plane, which are invariant under 
To(N), ie. 


f((at +b)/(cet +a))= f(t), g((at +b)/(ct + d)) = ge) 


for every 


and which parametrize Gp: 
g(t) = fr(t) +af(e) +0. 


The significance of modular elliptic curves is that one can apply to them the 
extensive analytic theory of modular forms. For example, through the work of Kolyva- 
gin (1990), together with results of Gross and Zagier (1986) and others, it is known that 
(as the BSD-conjecture predicts) a modular elliptic curve has rank 0 if its L-function 
does not vanish at s = 1, and has rank | if its L-function has a simple zero at s = 1. 

The third conjecture, stated rather roughly by Taniyama (1955) and more precisely 
by Weil (1967), is simply this: 


TW-Conjecture: Every elliptic curve over the field Q of rational numbers is modular. 


The name of Shimura is often also attached to this conjecture, since he certainly 
contributed to its ultimate formulation. Shimura (1971) further showed that any elliptic 
curve which admits complex multiplication is modular. A big step forward was made 
by Wiles (1995) who, with assistance from Taylor, showed that any semi-stable elliptic 
curve is modular. A complete proof of the TW-conjecture, due to Diamond and others, 
has recently been announced by Darmon (1999). Thus all the results which had previ- 
ously been established for modular elliptic curves actually hold for all elliptic curves 
over Q. 

It should be mentioned that there is also a ‘Riemann hypothesis’ for elliptic curves 
over Q, namely that all zeros of the L-function in the critical strip 1/2 < &s < 3/2 
lie on the line Zs = 1. 

Mordell’s theorem was extended from elliptic curves over Q to abelian varieties 
over any algebraic number field by Weil (1928). Many other results in the arithmetic 
of elliptic curves have been similarly extended. The topic is too vast to be considered 
here, but it should be said that our exposition for the prototype case is not always in 
the most appropriate form for such generalizations. 
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In the same paper in which he proved his theorem, Mordell (1922) conjectured that 
if a non-singular irreducible projective curve, defined by a homogeneous polynomial 
F(x, y, Z) with rational coefficients, has infinitely many rational points, then it is bira- 
tionally equivalent to a line, a conic or a cubic. Mordell’s conjecture was first proved by 
Faltings (1983). Actually Falting’s result was not restricted to plane algebraic curves, 
and on the way he proved two other important conjectures of Tate and Shafarevich. 

Falting’s result implies that the Fermat equation x” + y” = z” has at most 
finitely many solutions in integers if n > 3. In the next section we will see that Wiles’ 
result that semi-stable elliptic curves are modular implies that there are no solutions in 
nonzero integers. 


6 Some Applications 


The arithmetic of elliptic curves has an interesting application to the ancient problem 
of congruent numbers. A positive integer n is (confusingly) said to be congruent if it is 
the area of a right-angled triangle whose sides all have rational length, i.e. if there exist 
positive rational numbers u,v, w such that u2 + v2 = w?, uv = 2n. For example, 6 is 
congruent, since it is the area of the right-angled triangle with sides of length 3, 4, 5. 
Similarly, 5 is congruent, since it is the area of the right-angled triangle with sides of 
length 3/2, 20/3, 41/6. 

In the margin of his copy of Diophantus’ Arithmetica Fermat (c. 1640) gave a 
complete proof that | is not congruent. The following is a paraphrase of his argument. 
Assume that | is congruent. Then there exist positive rational numbers u,v, w such 
that 


w+to=w, uv =2. 


Since an integer is a rational square only if it is an integral square, on clearing denom- 
inators it follows that there exist positive integers a, b, c,d such that 


av+h=c, ab=d?’. 


Choose such a quadruple a, b, c, d for which c is minimal. Then (a, b) = 1. Since d 
is even, exactly one of a, b is even and we may suppose it to be a. Then 


a=2¢7, b=h* 


for some positive integers g, h. Since b and c are both odd and (b, c) = 1, 
(c—b,c+b)=2. 


Since 


(c—b)(c+b)= a= Ag*, 


it follows that 
c+b=2ci, c—b=2di, 
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for some relatively prime positive integers c;, d,. Then 
(cj - di} (ej + dj) =} —d} =b =P’. 
But 
(ci —di, G +d?) al 

since (ce. dt) = | and b is odd. Hence 

ce — de = Pp’, a+d =q’, 
for some odd positive integers p, g. Thus 

a=(q+p)/2, bi=(q-p)/2 

are positive integers and 

ad +b? = (q* + p*)/2=c?, 

2ab = (q° — p*)/2 = dj. 


Since c] < oA < c, this contradicts the minimality of c. 
It follows that the Fermat equation 


fase 


has no solutions in nonzero integers x, y, z. For if a solution existed and if we put 


2 2 4. 04) /,2 

u=2lyz|/x°, v= x"/lyz|, w= O° +2)/x'lyzl, 

we would have u* + v* = w? 

It is easily seen that a positive integer n is congruent if and only if there exists a 
rational number x such that x, x +n and x — n are all rational squares. For suppose 


,to = 2, 


x=r’, xtn=s’, x—-n=t, 


and put 
u=s—-t, v=st+t, w=2r 
Then 
uv =s? —t? = 2n 
and 


ur + v7 = 2(s* +t?) = 4x = wr’. 
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Conversely, if u,v, w are rational numbers such that wy = 2n and u2+v*? = w’, then 
(ut+v)? =w?+4n, (u—v)? =w* —4n. 


Thus, if we put x = (w/2)*, then x, x +n and x — n are all rational squares. 

It may be noted that if x is a rational number such that x, x + n and x — n are all 
rational squares, then x 4 —n,0,n, since n > 0 and 2 is not a rational square. 

The problem of determining which positive integers are congruent was considered 
by Arab mathematicians of the 10th century AD, and later by Fibonacci (1225) in his 
Liber Quadratorum. The connection with elliptic curves will now be revealed: 


Proposition 13 A positive integer n is congruent if and only if the cubic curve Cy 
defined by the polynomial 


Pa ory 
has a rational point P = (x, y) with y £0. 


Proof Suppose first that n is congruent. Then there exists a rational number x such 
that x, x +n and x — n are all rational squares. Hence their product 


x3 —n?x = x(x —n)(x +n) 


2x = y?, where y 


is also a rational square. Since x 4 —n, 0,1, it follows that x? —1n 
is a nonzero rational number. 
Suppose now that P = (x, y) is any rational point of the curve C, with y 4 0. If 


we put 
u=|(x*—n)/yl, 0 =[2nx/yl, w= |" +n°)/yI, 
then u, 0, w are positive rational numbers such that 


ue+o0- = w’, uv = 2n. 


It is readily verified that 2 = 1/2 in the Riemann normal form for Cy. 

We now show that, for every positive integer n, the torsion group of C,, has order 4, 
consisting of the identity element O, and the three elements (0, 0), (n, 0), (—n, 0) of 
order 2. Assume on the contrary that for some positive integer n the curve C,, has a 
rational point P = (x, y) of finite order with y ¥ 0 and take 7 to be the least positive 
integer with this property. Then 2P = (x’, y’) is also a rational point of C,, of finite 
order. The formula for the other point of intersection with C, of the tangent to C, at 
P shows that 


x! = [(x? +n7)/2y/’. 
It follows that 


x’ +n=[(x? —n7? + 2nx)/2y}’, 
x’ —n=[(x? —n? — 2nx)/2y]’. 
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Moreover x’, x’ + n and x’ — n are all nonzero rational squares. Since 2 P is of finite 
order, the theorem of Nagell and Lutz mentioned in §5 implies that x’ is an integer. 
Consequently 


Ss x'+n=s", x —-n=t’ 


for some positive integers r, s, t. Hence n is even, since 

n=s°-P= (s —t)(s +f) 
and if one of s — ft and s +f is even, so also is the other. Since n = s2 — r? and 
any integral square is congruent to 0 or | mod 4, we cannot have n = 2 mod 4. Hence 
n = Omod4. But then (x’/4, y’/8) is a rational point of finite order of Cy/4, which 
contradicts the minimality of n. 

If n is congruent, then so also is m?n for any positive integer m. Thus it is enough 
to determine which square-free positive integers are congruent. By what we have just 
proved and Proposition 13, a square-free positive integer n is congruent if and only if 
the elliptic curve C,;, has positive rank. Since C,, admits complex multiplication, a re- 
sult of Coates and Wiles (1977) shows that if C,, has positive rank, then its L-function 
vanishes at s = 1. (According to the BSD-conjecture, C,, has positive rank if and only 
if its L-function vanishes at s = 1.) 

By means of the theory of modular forms, Tunnell (1983) has obtained a practical 
necessary and sufficient condition for the L-function L(s, C,,) of C, to vanish at s = 1: 
if n is a square-free positive integer, then L(1, C,) = Oif and only if A+(m) = A_(n), 
where A(n), resp. A_(n), is the number of triples (x, y, z) € Z° with z even, resp. z 
odd, such that 


x? 42y?4+822 =n ifnisodd, or 2x7+2y?+167?7 =n _ ifn is even. 


It is not difficult to show that A+ (nm) = A_(n) whenn = 5, 6 or 7 mod 8, but there 
seems to be no such simple criterion in other cases. With the aid of a computer it has 
been verified that, for every n < 10000, n is congruent if and only if A+(m) = A_(n). 

The arithmetic of elliptic curves also has a useful application to the class number 
problem of Gauss. For any square-free integer d < 0, let h(d) be the class number of 
the quadratic field Q(/d). As mentioned in §8 of Chapter IV, it was conjectured by 
Gauss (1801), and proved by Heilbronn (1934), that h(d) — oo asd 4 —oo. How- 
ever, the proof does not provide a method of determining an upper bound for the values 
of d for which the class number /(d) has a given value. As mentioned in Chapter II, 
Stark (1967) showed that there are no other negative values of d for which h(d) = 1 
besides the nine values already known to Gauss. Using methods developed by Baker 
(1966) for the theory of transcendental numbers, it was shown by Baker (1971) and 
Stark (1971) that there are exactly 18 negative values of d for which h(d) = 2. A 
simpler and more powerful method for attacking the problem was found by Goldfeld 
(1976). He obtained an effective lower bound for h(d), provided that there exists a 
modular elliptic curve over Q whose L-function has a triple zero at s = 1. Gross and 
Zagier (1986) showed that such an elliptic curve does indeed exist. However, to show 
that this elliptic curve was modular required a considerable amount of computation. 
The proof of the 7W-conjecture makes any computation unnecessary. 
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The most celebrated application of the arithmetic of elliptic curves has been the 
recent proof of Fermat’s last theorem. In his copy of the translation by Bachet of 
Diophantus’ Arithmetica Fermat also wrote “It is impossible to separate a cube into 
two cubes, or a fourth power into two fourth powers or, in general, any power higher 
than the second into two like powers. I have discovered a truly marvellous proof of 
this, which this margin is too narrow to contain.” 

In other words, Fermat asserted that, ifn > 2, the equation 


x” + y" = zn 


has no solutions in nonzero integers x, y, z. In §2 of Chapter III we pointed out that it 
was sufficient to prove his assertion when n = 4 and when n = p is an odd prime, and 
we gave a proof there for n = 3. 

A nice application to cubic curves of the case n = 3 was made by Kronecker 
(1859). If we make the change of variables 


x =2a/3b-—1), y=(Gb+1)/Bb-1), 
with inverse 
a=x/(y-1), b=(Vt+)/30-), 
then 
x3 + y? — 1 = 2(4a? + 27b? + 1)/Bb — 1). 


Since the equation x + y? = 1 has no solution in nonzero rational numbers, the only 
solutions in rational numbers of the equation 


4a3 + 27b* = -1 


area = —1, b = +1/3. Consequently the only cubic curves @,, with rational coeffi- 
cients a, b and discriminant d = —1 are Y? — X? + X + 1/3. 

We return now to Fermat’s assertion. In the present section we have already given 
Fermat’s own proof for n = 4. Suppose now that p > 5 is prime and assume, contrary 
to Fermat’s assertion, that the equation 


aP+bP +c? =0 


does have a solution in nonzero integers a, b, c. By removing any common factor we 
may assume that (a, b) = 1, and then also (a, c) = (b, c) = 1. Since a, b, c cannot all 
be odd, we may assume that b is even. Then a and c are odd, and we may assume that 
a = —1mod4. 

We now consider the projective cubic curve & 4, defined by the polynomial 


¥* — X(X — A)(X + B), 
where A = a? and B = b?. By construction, (A, B) = | and 


=-—Imod4, B=0Omod32. 
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Moreover, if we put C = —(A+ B), then C ¥ O and (A, C) = (B, C) = 1. The linear 
change of variables 


X7> 4X, Yo 8Y+4Xx 
replaces &4_g by the elliptic curve W4,z defined by 
¥? XY (x? 4+ (B= A — 1)X" 4 ABY/16}, 
which has discriminant 
A = (ABC)? /28. 


Our hypotheses ensure that the coefficients of W4,z are integers and that 4 is anonzero 
integer. It may be shown that W,_z is actually a minimal model for 4,3. Moreover, 
when we reduce modulo any prime ¢ which divides 4, the singular point which arises 
is anode. Thus W%4_z is semi-stable and its conductor N is the product of the distinct 
primes dividing ABC. 

Fermat’s last theorem will be proved, for any prime p > 5, if we show that such an 
elliptic curve cannot exist if A, B, C are all p-th powers. If p is large, one reason for 
suspecting that such an elliptic curve cannot exist is that the discriminant is then very 
large compared with the conductor. Another reason, which does not depend on the size 
of p, was suggested by Frey (1986). Frey gave a heuristic argument that W%,, 3 could 
not then be modular, which would contradict the TW-conjecture. 

Frey’s intuition was made more precise by Serre (1987). Let G be the group of 
all automorphisms of the field of all algebraic numbers. With any modular form for 
Io(N) one can associate a 2-dimensional representation of G over a finite field. Serre 
showed that Fermat’s last theorem would follow from the TW-conjecture, together with 
a conjecture about lowering the level of such “Galois representations’ associated with 
modular forms. The latter conjecture was called Serre’s ¢-conjecture, because it was a 
special case of a much more general conjecture which Serre made. 

Serre’s ¢-conjecture was proved by Ribet (1990), although the proof might be de- 
scribed as being of order e~!. Now, for the first time, the falsity of Fermat’s last the- 
orem would have a significant consequence: the falsity of the TW-conjecture. Since 
Wap is semi-stable with the normalizations made above, to prove Fermat’s last the- 
orem it was actually enough to show that any semi-stable elliptic curve was modular. 
As stated in 85, this was accomplished by Wiles (1995) and Taylor and Wiles (1995). 
We will not attempt to describe the proof since, besides Fermat’s classic excuse, it is 
beyond the scope of this work. 

Fermat’s last theorem contributed greatly to the development of mathematics, but 
Fermat was perhaps lucky that his assertion turned out to be correct. After proving 
Fermat’s assertion for n = 3, that the cube of a positive integer could not be the sum 
of two cubes of positive integers, Euler asserted that, also for any n > 4, an n-th power 
of a positive integer could not be expressed as a sum of n — 1 n-th powers of positive 
integers. A counterexample to Euler’s conjecture was first found, for n = 5, by Lander 
and Parkin (1966): 


27° + 84° + 110° + 133° = 144°. 
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Elkies (1988) used the arithmetic of elliptic curves to find infinitely many counterex- 
amples for n = 4, the simplest being 


958007 + 2175197 + 414560* = 4224817. 


A prize has been offered by Beal (1997) for a proof or disproof of his conjecture 
that the equation 


x! +4 yn = zn 


has no solution in coprime positive integers x, y, z if /,m,n are integers > 2. (The 
exponent 2 must be excluded since, for example, 2° + 77 = 3+ and 2’ + 177 = 717.) 
Will Beal’s conjecture turn out to be like Fermat’s or like Euler’s? 


7 Further Remarks 


For sums of squares, see Grosswald [31], Rademacher [46], and Volume I, Chapter IX 
of Dickson [23]. A recent contribution is Milne [42]. 

A general reference for the theory of partitions is Andrews [2]. Proposition 4 is 
often referred to as Euler’s pentagonal number theorem, since m(3m — 1)/2 (m > 1) 
represents the number of dots needed to construct successively larger and larger pen- 
tagons. A direct proof of the combinatorial interpretation of Proposition 4 was given 
by Franklin (1881). It is reproduced in Andrews [2] and in van Lint and Wilson [41]. 
The replacement of proofs using generating functions by purely combinatorial proofs 
has become quite an industry; see, for example, Bressoud and Zeilberger [13], [14]. 

Besides the q-difference equations used in the proof of Proposition 5, there are also 
q-integrals: 


f @)dgx = >) f(aq")(aq" — aq"*"). 


n>0 


The g-binomial coefficients (mentioned in §2 of Chapter II) 


Kl = Kl = (G)n/(Mm(Q)n—-m (OS m <n), 


where (a)9 = | and 


(@)n = (1 - a)(1 —aq)---(1—aq"") @ > 1), 


have recurrence properties similar to those of ordinary binomial coefficients: 


eles a le OP aad (0 <m <n). 


The q-hypergeometric series 


Y@nO)nx"/On Qn 


n>0 
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was already studied by Heine (1847). There is indeed a whole world of g-analysis, 
which may be regarded as having the same relation to classical analysis as quan- 
tum mechanics has to classical mechanics. (The choice of the letter ‘g’ nearly a 
century before the advent of quantum mechanics showed remarkable foresight.) There 
are introductions to this world in Andrews ef al. [4] and Vilenkin and Klimyk [58]. 
For Macdonald’s conjectures concerning g-analogues of orthogonal polynomials, see 
Kirillov [36]. 

Although g-analysis always had its devotees, it remained outside the mainstream 
of mathematics until recently. Now it arises naturally in the study of quantum groups, 
which are not groups but q-deformations of the universal enveloping algebra of a Lie 
algebra. 


The Rogers—Ramanujan identities were discovered independently by Rogers 
(1894), Ramanujan (1913) and Schur (1917). Their romantic history is retold in 
Andrews [2], which contains also generalizations. For the applications of the iden- 
tities in statistical mechanics, see Baxter’s article (pp. 69-84) in Andrews et al. [3]. 
(The same volume contains other interesting articles on mathematical developments 
arising from Ramanujan’s work.) 


The Jacobi triple product formula was derived in Chapter XII as the limit of a 
formula for polynomials. Andrews [1] has given a similar derivation of the Rogers— 
Ramanujan identities. This approach has found applications and generalizations in 
conformal field theory, with the two sides of the polynomial identity corresponding 
to fermionic and bosonic bases for Fock space; see Berkovich and McCoy [9]. 


These connections go much further than the Rogers—Ramanujan identities. There 
is now a vast interacting area which involves, besides the theory of partitions, solv- 
able models of statistical mechanics, conformal field theory, integrable systems in 
classical and quantum mechanics, infinite-dimensional Lie algebras, quantum groups, 
knot theory and operator algebras. For introductory accounts, see [45], [10] and 
various articles in [24] and [27]. More detailed treatments of particular aspects are 
given in Baxter [8], Faddeev and Takhtajan [26], Jantzen [33], Jones [34], Kac [35] 
and Korepin et al. [38]. 


For the Hardy—Ramanujan—Rademacher expansion for p(7), see Rademacher [46] 
and Andrews [2]. An interesting proof by means of probability theory for the first term 
of the expansion has been given by Bdez-Duarte [5]. 


The definition of birational equivalence in §3 is adequate for our purposes, but has 
been superseded by a more general definition in the language of ‘schemes’, which is 
applicable to algebraic varieties of arbitrary dimension without any given embedding 
in a projective space. For the evolution of the modern concept, see Cizmar [18]. 


The history of the discovery of the group law on a cubic curve is described by 
Schappacher [48]. 


Several good accounts of the arithmetic of elliptic curves are now available; e.g., 
Knapp [37] and the trilogy [52], [50], [51]. Although the subject has been transformed 
in the past 25 years, the survey articles by Cassels [16], Tate [55] and Gelbart [28] are 
still of use. Tate gives a helpful introduction, Cassels has many references to the older 
literature, and Gelbart explains the connection with the Langlands program, for which 
see also Gelbart [29]. 
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For reference, we give here the formulas for addition on an elliptic curve in the 
so-called Weierstrass’s normal form. If P; = (x1, y;) and Po = (x2, y2) are points of 
the curve 


Y* a aixY aay = (OC + ak + aX + x), 


then 

—P) = (x1, -y1 — a1x1— 43), Py + Po = P3 = (x3, —y3), 
where 

x3 =AA+a1)-a2-x1-32, y3=Uta)x3+ut+as3, 
and 


A= (y2— y1)/@2-—%1),  M = (yix2 — yaxi)/(%2 — x1) if x1 F x2; 
A = (3x7 + 2a2x1 +a4—aiyi)/N, w= (—x} +.a4x1 + 246 — a3y1)/N, 
with N = 2y, + a1x; +43 if x) = x2, Po 4 —P). 


An algorithm for obtaining a minimal model of an elliptic curve is described in 
Laska [40]. Other algorithms connected with elliptic curves are given in Cremona [21]. 

The original conjecture of Birch and Swinnerton-Dyer was generalized by Tate [54] 
and Bloch [11]. For a first introduction to the theory of modular forms see Serre [49], 
and for a second see Lang [39]. 

Hasse actually showed that, if & is an elliptic curve over any finite field F, contain- 
ing q elements, then the number N, of Fg-points on & (including the point at infinity) 
satisfies the inequality 


IN -(@+ DI s 24)”. 


For an elementary proof, see Chahal [17]. Hasse’s result is the special case, when the 
genus g = 1, of the Riemann hypothesis for function fields, which was mentioned in 
Chapter IX, 85. 

It follows from the result of Siegel (1929), mentioned in §9 of Chapter IV, and even 
from the earlier work of Thue (1909), that an elliptic curve with integral coefficients 
has at most finitely many integral points. However, their method is not constructive. 
Baker [6], using the results on linear forms in the logarithms of algebraic numbers 
which he developed for the theory of transcendental numbers, obtained an explicit up- 
per bound for the magnitude of any integral point in terms of an upper bound for the 
absolute values of all coefficients. Sharper bounds have since been obtained, e.g. by 
Bugeaud [15]. (For modern proofs of Baker’s theorem on the linear independence of 
logarithms of algebraic numbers, see Waldschmidt [59]. The history of Baker’s method 
is described in Baker [7].) 

For information about the proof of Mordell’s conjecture we refer to Bloch [12], 
Szpiro [53], and Cornell and Silverman [19]. The last includes an English translation 
of Faltings’ original article. As mentioned in §9 of Chapter IV, Vojta (1991) has given 
a proof of the Mordell conjecture which is completely different from that of Faltings. 
There is an exposition of this proof, with simplifications due to Bombieri (1990), in 
Hindry and Silverman [32]. 
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For congruent numbers, see Volume II, Chapter X VI of Dickson [23], Tunnell [57], 
and Noda and Wada [43]. The survey articles of Goldfeld [30] and Oesterlé [44] deal 
with Gauss’s class number problem. 

References for earlier work on Fermat’s last theorem were given in Chapter III. 
Ribet [47] and Cornell et al. [20] provide some preparation for the original papers of 
Wiles [60] and Taylor and Wiles [56]. For the TW-conjecture, see also Darmon [22]. 
For Euler’s conjecture, see Elkies [25]. 


8 Selected References 


[1] G.E. Andrews, A polynomial identity which implies the Rogers-Ramanujan identities, 
Scripta Math. 28 (1970), 297-305. 

[2] G.E. Andrews, The theory of partitions, Addison-Wesley, Reading, Mass., 1976. 
[Paperback edition, Cambridge University Press, 1998] 

[3] G.E. Andrews, R.A. Askey, B.C. Berndt, K.G. Ramanathan and R.A. Rankin (ed.), 
Ramanujan revisited, Academic Press, London, 1988. 

[4] G.E. Andrews, R. Askey and R. Roy, Special functions, Cambridge University Press, 1999. 

[5] L. Baez-Duarte, Hardy-Ramanujan’s asymptotic formula for partitions and the central 
limit theorem, Adv. in Math. 125 (1997), 114-120. 

[6] A. Baker, The diophantine equation y* = ax> + bx? + cx +d, J. London Math. Soc. 43 
(1968), 1-9. 

[7] A. Baker, The theory of linear forms in logarithms, Transcendence theory: advances and 
applications (ed. A. Baker and D.W. Masser), pp. 1-27, Academic Press, London, 1977. 

[8] R.J. Baxter, Exactly solved models in statistical mechanics, Academic Press, London, 
1982. [Reprinted, 1989] 

[9] A. Berkovich and B.M. McCoy, Rogers-Ramanujan identities: a century of progress from 
mathematics to physics, Proceedings of the International Congress of Mathematicians: 
Berlin 1998, Vol. III, pp. 163-172, Documenta Mathematica, Bielefeld, 1998. 

[10] J.S. Birman, New points of view in knot theory, Bull. Amer. Math. Soc. (N.S.) 28 (1993), 
253-287. 

[11] S. Bloch, A note on height pairings, Tamagawa numbers, and the Birch and Swinnerton- 
Dyer conjecture, Invent. Math. 58 (1980), 65-76. 

[12] S. Bloch, The proof of the Mordell conjecture, Math. Intelligencer 6 (1984), no. 2, 41-47. 

[13] D.M. Bressoud and D. Zeilberger, A short Rogers-Ramanujan bijection, Discrete Math. 
38 (1982), 313-315. 

[14] D.M. Bressoud and D. Zeilberger, Bijecting Euler’s partitions-recurrence, Amer. Math. 
Monthly 92 (1985), 54-55. 

[15] Y. Bugeaud, On the size of integer solutions of elliptic equations, Bull. Austral. Math. Soc. 
57 (1998), 199-206. 

[16] J.W.S. Cassels, Diophantine equations with special reference to elliptic curves, J. London 
Math. Soc. 41 (1966), 193-291. 

[17] J.S. Chahal, Manin’s proof of the Hasse inequality revisited, Nieuw Arch. Wisk. (4) 13 
(1995), 219-232. 

[18] J. Cizmar, Birationale Transformationen (Ein historischer Uberblick), Period. Polytech. 
Mech. Engrg. 39 (1995), 9-24. 

[19] G. Cornell and J.H. Silverman (ed.), Arithmetic geometry, Springer-Verlag, New York, 
1986. 

[20] G. Cornell, J.-H. Silverman and G. Stevens (ed.), Modular forms and Fermat’s last 
theorem, Springer, New York, 1997. 


[21] 
[22] 
[23] 
[24] 


[25] 
[26] 


[27] 
[28] 
[29] 
[30] 
[31] 


[32] 
[33] 


[34] 


[35] 
[36] 


[37] 
[38] 


[39] 
[40] 


[41] 
[42] 
[43] 
[44] 
[45] 


[46] 
[47] 


[48] 


[49] 


8 Selected References 585 


J.E. Cremona, Algorithms for modular elliptic curves, 2nd ed., Cambridge University 
Press, 1997. 

H. Darmon, A proof of the full Shimura—Taniyama—Weil conjecture is announced, Notices 
Amer. Math. Soc. 46 (1999), 1397-1401. 

L.E. Dickson, History of the theory of numbers, 3 vols., Carnegie Institute, Washington, 
D.C., 1919-1923. [Reprinted Chelsea, New York, 1992] 

L. Ehrenpreis and R.C. Gunning (ed.), Theta functions: Bowdoin 1987, Proc. Symp. Pure 
Math. 49, Amer. Math. Soc., Providence, R.I., 1989. 

N.D. Elkies, On A* + B4 + C* = D*, Math. Comp. 51 (1988), 825-835. 

L.D. Faddeev and L.A. Takhtajan, Hamiltonian methods in soliton theory, Springer-Verlag, 
Berlin, 1987. 

A.S. Fokas and V.E. Zakharov (ed.), Important developments in soliton theory, Springer- 
Verlag, Berlin, 1993. 

S. Gelbart, Elliptic curves and automorphic representations, Adv. in Math. 21 (1976), 
235-292. 

S. Gelbart, An elementary introduction to the Langlands program, Bull. Amer. Math. Soc. 
(N.S.) 10 (1984), 177-219. 

D. Goldfeld, Gauss’ class number problem for imaginary quadratic fields, Bull. Amer. 
Math. Soc. (N.S.) 13 (1985), 23-37. 

E. Grosswald, Representations of integers as sums of squares, Springer-Verlag, New York, 
1985. 

M. Hindry and J.H. Silverman, Diophantine geometry, Springer, New York, 2000. 

J.C. Jantzen, Lectures on quantum groups, American Mathematical Society, Providence, 
RI, 1996. 

V.E.R. Jones, Subfactors and knots, CBMS Regional Conference Series in Mathematics 
80, Amer. Math. Soc., Providence, R.I., 1991. 

V.G. Kac, Infinite-dimensional Lie algebras, 3rd ed., Cambridge University Press, 1990. 
A.A. Kirillov, Jr., Lectures on affine Hecke algebras and Macdonald’s conjectures, Bull. 
Amer. Math. Soc. (N.S.) 34 (1997), 251-292. 

A.W. Knapp, Elliptic curves, Princeton University Press, Princeton, N.J., 1992. 

V.E. Korepin, N.M. Bogoliubov and A.G. Izergin, Quantum inverse scattering method 
and correlation functions, Cambridge University Press, 1993. 

S. Lang, Introduction to modular forms, Springer-Verlag, Berlin, corr. reprint, 1995. 

M. Laska, An algorithm for finding a minimal Weierstrass equation for an elliptic curve, 
Math. Comp. 38 (1982), 257-260. 

J.H. van Lint and R.M. Wilson, A course in combinatorics, Cambridge University Press, 
1992. 

S.C. Milne, New infinite families of exact sums of squares formulas, Jacobi elliptic func- 
tions and Ramanujan’s tau function, Proc. Nat. Acad. Sci. U.S.A. 93 (1996), 15004-15008. 
K. Noda and H. Wada, All congruent numbers less than 10000, Proc. Japan Acad. Ser. 
A Math. Sci. 69 (1993), 175-178. 

J. Oesterlé, Le probléme de Gauss sur le nombre de classes, Enseign. Math. 34 (1988), 
43-67. 

M. Okado, M. Jimbo and T. Miwa, Solvable lattice models in two dimensions and modular 
functions, Sugaku Exp. 2 (1989), 29-54. 

H. Rademacher, Topics in analytic number theory, Springer-Verlag, Berlin, 1973. 

K.A. Ribet, Galois representations and modular forms, Bull. Amer. Math. Soc. (N.S.) 32 
(1995), 375-402. 

N. Schappacher, Développement de la loi de groupe sur une cubique, Séminaire de Théorie 
des Nombres, Paris 1988-89 (ed. C. Goldstein), pp. 159-184, Birkhauser, Boston, 1990. 
J.-P. Serre, A course in arithmetic, Springer-Verlag, New York, 1973. 


586 


[50] 
[51] 


[52] 
[53] 
[54] 


[55] 
[56] 


[57] 
[58] 
[59] 


[60] 


XII Connections with Number Theory 


J.H. Silverman, The arithmetic of elliptic curves, Springer-Verlag, New York, 1986. 

J.H. Silverman, Advanced topics in the arithmetic of elliptic curves, Springer-Verlag, 
New York, 1994. 

J.H. Silverman and J. Tate, Rational points on elliptic curves, Springer-Verlag, New York, 
1992. 

L. Szpiro, La conjecture de Mordell [d’aprés G. Faltings], Astérisque 121-122 (1985), 
83-103. 

J.T. Tate, On the conjectures of Birch and Swinnerton-Dyer and a geometric analog, 
Séminaire Bourbaki: Vol. 1965/1966, Exposé no. 306, Benjamin, New York, 1966. 

J.T. Tate, The arithmetic of elliptic curves, Invent. Math. 23 (1974), 179-206. 

R.L. Taylor and A. Wiles, Ring theoretic properties of certain Hecke algebras, Ann. of 
Math. 141 (1995), 553-572. 

J.B. Tunnell, A classical Diophantine problem and modular forms of weight 3/2, Invent. 
Math, 72 (1983), 323-334. 

N. Ja. Vilenkin and A.V. Klimyk, Representation of Lie groups and special functions, 
4 vols., Kluwer, Dordrecht, 1991-1995. 

M. Waldschmidt, Diophantine approximation on linear algebraic groups, Springer, 
Berlin, 2000. 

A. Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. 141 (1995), 
443-551. 


Additional References 


R.E. Borcherds, What is moonshine?, Proceedings of the International Congress 
of Mathematicians: Berlin 1998, Vol. I, pp. 607-615, Documenta Mathematica, 
Bielefeld, 1998. 

C. Breuil, B. Conrad, F. Diamond and R. Taylor, On the modularity of elliptic curves 
over Q, J. Amer. Math. Soc. 14 (2001), 843-939. 

Chandrasekhar Khare, Serre’s modularity conjecture, Preprint. 


Notations 


€, €,=,4,8,C,C,U,N, 2 
B\A, A‘S,3 

A x B, A", aRb, 3 

Ra, f: A B, f(a), f(A), 4 
i4,gof,4 

f,N, 1, S(@), 5 

Sm(n), 6 

a+b, py(n),a-b,7 
<,<,>,>,8 

Ins 9 

#(E), 10 

~,Z,+, 11 

0, -—a,b—a,-,1,12 

P,—P, 13 
P+P,P-P,a*,a <b, 14 
a/b, Z*, ~~, Q, 15 

+,-,a7!, 16 

P,—P, 16,17 
P,A<B,18 

A+ B,19 

AB, 20 

R, ~, 22 

Ja,a'l*, b”, “fa, a'/",R, 23 
limy—s00, dn 2 1,n > co, 24 
inf, sup, lim, ,,9, liMn—oo, 24 

[a, b], 26 

Ja|, d(a, b), 27 

Bs(x), A, int A, R”, |a|, 28 

lali, lala, F4, 28 

C1), 1fI I fli. flo, @(R), FS, 29 
limy—so0 dn = A, An a, 30 

E, 31 

EAD) 17), 32 


g' (xo), 33 

|A|, 34 

B,, 35 

e, 38, 187 

e’, 39, 45 

Inx, log x, 39 

C,i, 40 

Z, Bz, Zz, 41 

cos z, sinz, 45 

z, 46, 48, 186, 217, 364, 509 
HH, A, n(A), t(A), 49 

i, j,k, 51 

V(u), 52 

(x, y), SU2(C), SO3(R), S?, P?(R), 53 
SO4(R), O, €, 53 

a,n(a), 54 

e,a—!, 55 

ab, HK, 56 

Sn, SEA), Ln, 57 

Ha, G/H,58 
a",<a>,<S>,59 

Na, G x G’, 60 

M,(Z), A(X), A+ B, AB, 61 
na,a—', R*, 62 

R/S, 63 

RO R’,av,0 + w, D", 64 
€(1), O, @'(1), Uj +U2, U, ®U2, 65 
< S >, 66 

dimV,[FE: F],e1,...,é@n, Tv, TS, 67 
S+7,GL(V), Mn(F), 68 

V ® V',T @T’',68 

M,(D), 69 

(u, v), |v], 71 
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B74 Mp, 158 

Cot), 73 y, 159, 380 

+R, 76 F,, 160 
GL,(Z), 162 

bla, bia, x, (a, b), 83 AB, 163, 301 

[a, b], 84 M, 1 M2, M, + Mo, 166 

aNb,av b,85 A, 171 

(41, 42,..-,4n)s (a1, a2, .--, dn], 86 |a|, 173 

K{[t], 87, 96 (f/g), 174 

K(t), 88, 262 

mC, 92,111 lE|,é,,7, 179 

a(f), | fl, RIA. RII], 96 [ao, a1, a2,...], 179, 182, 212 

K[t, 17], 98 Pns Gn, 180 

c(f), 100 Pn/Gns 181, 212 

R[t,,...,tn], 101 [ao, 41,..., an], 182 

@ p(x), 102 M(é), 190 

f’, 103 D,191 

(a), 104 c’, 192 

QW), Ga, 105 [40,.+«+5/m—1; Ags og im podl 192 

a = bmod m, ¥, 106 €,198 

Zum), 107 KH, 201 

Zin» 108 I’, SL2(Z), T(z), S(z), 202 

7, 9(m), 109 R(z), 0(F), 203 

@, (x), 111, 112 F, 204 

f(x), 111 t(f), 206 

x 114 hi (D), 206, 207 

G,N(y), 119 H/T, 209, 218 

IH, 7,120 T(n), €, wp, 210 

N(y), (a, B)r, 121 Fwy 20 

Lx], g(k), w(k), 122 PSL2(R), 218 

G(k), 123 

K[[t1,..-5tmll, 124 det A, 224-229 

Fg, 125 My, diag[aj1, 422, ..., Ann], 225 
SLy(F), A‘, 226, 229 

(a/n), sgn(zq), 130 lll], 229, 234 

(a/p), 133 A ®B, 231 

G(m,n), 137 Ig 3088 

Q(V/d), a’, 140 €m, 233, 248 

N(a), w, 64,%, &, 141 2-(v, k, A), 247 

(a1,...,4m), AB, 145 t-(v, k, 2), 250 

A’, 146 Cp, Ln, PSLn(q), 251 

h(d), O(K), 151 M2, M11, M24, M23, M22, 251 

f * g, 152 |x|, 254 

d(n), L,|f |, 153 [n, k, d], C(H), Gra, R(1, m), 255 

i(n), j(n), 154 

t(n), a(n), 155 Ce ((z)), Ceffzl], lal, 261 


u(n), f(r), 156 |aloos Vp (a), |alp, K (¢), 262 


Notations 


| floo. vp(f), lf lp. 262 
K ((t)), 263 

F, 270, 271 

Qp, lal], 271 

R,M,U, 274 

Lp, 275 

x, k, 276 

F(x), 280 


F*, F*, (u,v), 292 

f ~g,detV,U+, Vi LV2, 293 
ind V, 296 

indt V, ind~ V, 297 

Tw, 300 

A ® B, 301 

V ~ V’, W(F), (a,b) p, 303 
fa, Ga, Qoo, (4, D)oo, (a, b) p, 304 
fa,b» Ga,b, 307 

SF(f), Sp(f), 310 

Q», | |v, (4, bv, 313 

(a, b/ F), 324 


x(x), A(S), 327 

Kn, 328, 348, 380 
lq ||, 329 

<Y>,331 

d(A), IT, 333 

int S, A*, 334 

B,, 337 

ui(K, A), A(K), 339 
K*, 340 

Ix|, 341 

d(y, z), llxll, (y, z), 342 
Hy, Gy, Gy, 342 

V (x0), Br (x0), 342 
V(A), 344 

Br, m(A), 345 

y (A), yn, 347 

0(K), bn, 348 
C1,-+-52ns An; Dy, Eg, Ey, E¢, 350 
Ay4, 351 

Ly, Ay > A, 352 
h(K, K’), 352, 353 


a(x), logx, 364 

log” x, Li(x), Pn, 365 
O(x), w(x), 367 

Ly], 368 
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A(n), C(s), 370 

o,t, 371 

k(u), k(t), 374, 375 
O(x), [(z), z!, 379 
B(x, Y), kn, Y; Z(s), 380 
a”, 382 

Yas Vn» 384 

|AI, Cx (s), 384 

aK(x), ZK(s), 385 
N(P), N(A), C1 (s), 386 
t(n), 388 

M(x), 390 

m2(x), L2(x), C2, 392 


m(x;m,a),0(x;m, a), 400 
yw(x;m, a), e, x, 400 

mG, g, 401 

Gin, 403 

L(s, x), 404 

A(s, x), 409 

p, 410, 434 

pr, 410 

p@o, pu, 411 

trA, 414 

x(s), 414, 434 

8, Oil, Ce Ny, 415 

Xu XR, 416 

Ck, hk, Lik, 418 

Cr, Cy, 418 

0,6, A(s), 419 

w(s), 420 

A, y?, 421 

Sn, By, Cy, @, 423 

K,, 426 

€(G), M(f), 432 

Ff, G8), (BE), LP(G), f * g, 433 
KH, p, x(s), G, 434 

f, it, 435 

€(G), 436 

SU), S°,437 

SO(3), 438 

T”, 439 

GL(n), O(n), U(n), Sp(n), 440 
[u, v], gl(n, R), gl(n, C), 440 
An, Bns Cn, Dn, L(G), 440 
G, 441 

G2, F4, Eo, E7, Eg, 442 
SU(n), SO(n), Spin(n), 442 
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Lol, {Cf}. L, Pa,p(N), 448 

Xa,p» 449 

e(t), 450 

a,p(N), 14, {x},m - x, 452 
Di, = Di(Ei,..-.EN)s PalN), 459 
on, 462 

Dy (x1, ...,XNn), 464 

B, AAB,o(P&), u(B), 465 
(X, B, u), ae., fy fdu, 466 
L(X, &, nw), T~'B, 466 

A, Tg, 472 

Ra, 473 

Pis--+5 Pr> [d-m; ---, 4m], 476 


(X, d), A, 485 


g(x), 496 

lo, Ths In(y ), 497 

gi(x), 498, 510 

U, V, 499, 531 

an, bn, M(a, b), 503 

K (a, b), 504, 505 

E&(a, b), en, 505 

Pa, b, p), 505 

Pn> Qn; 506 

c, # (a,c), Cn, 507 
E(a,c), K(A), E(A), 508 
fax), 510 

S(t) = S(t, 4), 510, 526 
E(u), 516, 530 

IT(u, a), 517, 530 
q,Z,9(v) = O(v; tT), 519 
61,(0) = 9a,p(0; t), 520 
Ao0(v), 901(0), Ao(v), A11 (0), 520 


Notations 


U1 (xv, g), V2(m0, g), 521 
U3(xD, q), V4(xo, q), Qo, 521 
snu,cnu,dnu,u= 106) (0)0, 525 
A(t), 525, 531 

K(z), K’(r), 527 

@(u), E(K), 530 
H,U,V,T,S, 531 

tS ieee etek 

B,D, J, 534 

F(a, B; y3.z),537 

64(z), 538 


ra(m), 541 

a(m), o'(m), 542 

r2(m), 543 

di(m), d3(m), r.(m), p(n), 544 
(a)o, (An, (A) co, 546 

n(t), 549 

6,6, 550 

W =W (a,,..., 46), 552, 558 
Ca,b, 553, 558 

0, d, 553 

P, + Po, —P, 556, 583 

E = E(Q), A(P), 559 

h(P), 560, 561 

(P, Q), 563 

Ca,B, D, E,N, 565 

E, E', Ef, 569 

A, bo, b4, bo, bg, 570 

Wy, Nps Cp, L(s) = L(s, W), S71 
cn, N=N(W), fy, A(s), 572 
r, EY, Q), 572 

To(N), 573 

Cn, 577 

Ai(n), A_(n), 578 

E4,B, 579 

Wap, 580 

[2 ]q, 581 


The Landau order symbols are defined in the following way: if J = [fo, oo) is a half- 
line and if f, g: J — R are real-valued functions with g(t) > 0 for all t € J, we 
write 

f = O(g) if there exists a constant C > 0 such that | f(t)|/g(t) < C for allt € /; 

f =0(g) if f(@/g) > Oast > oo; 

f[~gsg if f(t)/g(t) — last > ow. 


The end of a proof is denoted by 


Axioms 


(A1)-(A3), 7 
(A4),(A5), 12 
(AM1), 7 
(AM2), 13 

(B1), (B2), 465 
(C1)(C4), 106 
(D1)-(D3), 27 
(M1)—(M4), 7, 83 
(M5), 16 


(N1)-(N3), 5 
(O1)-(04), 8 
(O4Y, 14 
(P1)—(P3), 13 
(P4), 19 
(Pr1)—(Pr3), 465 
(V1)-(V3), 261 
(V3Y, 261 


Index 


abelian 
group, 55, 79, 172, 196, 400, 434, 
441,555,556 
Lie algebra, 441 
absolute value, 27, 41, 211, 261 
addition, 60 
of integers, 11 
of natural numbers, 6—7 
of points of elliptic curve, 555-556, 
582 
of rational numbers, 16 
addition theorem for 
elliptic functions, 511,527,555 
exponential function, 38, 45, 46, 77 
theta functions, 523 
trigonometric functions, 46, 528 
adeles, 443, 444 
affine 
conic, 549 
cubic, 549 
line, 549 
plane curve, 549-552 
AGM algorithm, 502-509, 512, 535, 
536 
algebra, texts, 78 
algebraic, 214 
addition theorem, 511, 537 
function field, 386-388, 395 
integer, 123, 141, 151 
number, 174, 214 
number field, 151, 174, 384-385 
algebraic number theory, texts, 174 


algebraic topology, 76, 78, 388 
algebraically closed field, 77 
almost 
all, 466 
everywhere, 466 
periodic function, 75, 79, 489 
alternating group, 57, 228, 251, 
423-425 
analysis 
complex, 48, 77 
quaternionic, 77 
real, 26, 76 
analytic continuation, 48, 381, 390, 
405, 515, 520, 527 
angle, 47, 209 
anisotropic subspace, 295, 297, 299, 
302 
approximation theorem for valuations, 
267 
arc length 
in hyperbolic geometry, 208 
of ellipse, 493, 494 
of lemniscate, 495 
archimedean absolute value, 262, 264, 
284 
Archimedean property, 22, 26, 76 
argument of complex number, 47 
arithmetic 
of elliptic curves, texts, 582 
of quaternions, 120-121, 126, 541 
progression, 312-313, 399, 443, 485, 
490, 492 


Index 


arithmetic-geometric mean, 502-509, 
512, 535-536 
arithmetical function, 152—154, 175 
arithmetical functions, 153, 175 
Artin’s 
primitive root conjecture, 124-125, 
385, 388 
reciprocity law, 174 
associative, 4, 60 
algebra, 68-69, 78, 79 
law, 7,55, 83 
asymptote, 550 
automorphism group of 
code, 255 
Hadamard matrix, 252 
t-design, 250 
automorphism of 
H, O, 77 
group, 59 
Hadamard matrix, 252 
quadratic field, 140 
ring, 64 
torus, 473 


badly approximable number, 190, 194, 
217, 463, 489 

Baire’s category theorem, 487 
Baker’s theorem, 583 
baker’s transformation, 482 
balanced incomplete block design, 247 
Banach algebra, 434 
basis of 

lattice, 332, 333 

module, 164, 169, 173 

vector space, 66 
Bateman—Horn conjecture, 393-395 
Beal’s conjecture, 581 
Bernoulli number, 151 
Bernoulli shift, one-sided, 479, 483 
Bernoulli shift, two-sided, 477, 483 
Bernstein’s theorem, 394 
Bertrand’s postulate, 366 
Bessel’s inequality, 74 
best approximation 

in inner product space, 73 

of real number, 186, 217 
beta integral, 380 
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Bézout 
domain, 95, 101, 124, 168 
identity, 91,92, 109, 118, 121, 167 
Bieberbach’s theorems, 346, 358 
bijection, 4 
bijective map, 4, 10 
binary 
digit (‘bit’), 255 
linear code, 255, 256 
operation, 55, 60 
quadratic form, 205-207, 218 
relation, 3 
binomial 
coefficient, 92—94, 111 
theorem, 110, 116, 537 
birational 
equivalence, 216, 558, 575, 582 
transformation, 216, 558 
Birkhoff’s 
ergodic theorem, 464-471, 479, 489 
recurrence theorem, 485 
Blaschke’s selection principle, 353, 
356, 357, 360 
Blichfeldt’s theorem, 334 
block, 247, 250 
Bohl’s theorem, 448, 451, 458 
Bolzano—Weierstrass theorem, 25, 42 
Boolean algebra, 75—76 
Boolean ring, 61 
Borel subset, 472, 473, 479, 481 
bounded 
sequence, 24, 26 
set, 30 
variation, 461—462, 489 
bracket product, 440 
Brahmagupta’s identity, 195-196, 304 
Brauer group, 324 
Brauer’s theorem, 425 
Brouwer’s fixed point theorem, 76 
Bruck—Ryser—Chowla theorem, 249, 
321,325 
Brun’s theorem, 392, 395 
BSD conjecture, 570, 572-573, 578, 
583 
Burnside’s theorem, 427-428 


calendars, 119, 187 
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cancellation law, 7, 76, 83, 147 
canonical height, 561-565 
Cantor’s construction of reals, 18, 26, 
269 
Carathéodory extension of measure, 477 
Cardano’s formula, 39, 76 
cardinality, 10 
Carmichael number, 116, 125 
Cartesian product, 3 
cascade, 465 
Casimir operator, 443 
Catalan number, 93 
Cauchy sequence, 25, 30 
Cauchy’s theorem, 48, 534 
Cauchy—Schwarz inequality, 28, 72 
Cayley number, 53 
Cayley—Hamilton theorem, 50 
central limit theorem, 76, 391, 395, 
489 
centralizer, 60 
centre, 49, 53, 68-70 
chain condition, 90, 95, 96, 98, 101, 148 
chain rule, 33 
character of 
abelian group, 400, 401, 434 
representation, 414, 437 
character theory, texts, 443 
characteristic 
function, 327 
of ring, 62-63, 103, 111 
polynomial, 283, 419 
Chebyshev’s functions, 367-370, 372, 
400 
Chevalley—Warning theorem, 116, 125 
Chinese remainder theorem, 119, 125, 
267, 316 
chord and tangent process, 555 
circle method, 123, 392, 549 
class 
field theory, 125, 325 
function, 414, 417, 437 
number, 151, 207, 218, 578, 584 
classical mechanics, 429, 465, 481, 490 
classification of 
finite simple groups, 124, 251, 258 
simple Lie algebras, 442, 444 


Index 


Clebsch—Gordan formula, 438 
Clifford algebra, 78, 324 
closed 

ball, 35, 344, 348 

sets, 28 
closure, 28, 204, 337, 485 
codes, 255, 388 
codeword, 256 
coding theory, texts, 258, 395 
coefficients, 66, 97 
combinatorial line, 488 
common divisor, 83 
common multiple, 84 
commutative 

group, 55, 59, 79, 109 

law, 7, 55, 83 

ring, 60-61, 75, 78 
compact 

abelian group, 434, 483 

group, 436, 437, 442 

metric space, 485 

set, 28, 43, 286-287, 432 
complement of set, 3, 61 
complete 

elliptic integral, 496, 503-509, 517, 

537 

metric space, 30-32, 75, 270 

ordered field, 23, 26 

quotient, 181-183, 212, 479 
completed zeta function, 381 
completion of 

metric space, 31 

valued field, 271 
complex 

analysis, 48, 77 

conjugate, 41 

integration, 77 

multiplication, 537, 574, 578 

number, 39-48, 69 
composite mapping, 4 
composite number, 88, 124 
composition of solutions, 196, 198, 199 
conductor of elliptic curve, 572 
conformal equivalence, 218 
congruence 

of integers, 106-119 


Index 


of symmetric matrices, 293 
subgroup, 210 
congruent, 106 
congruent numbers, 575-578, 584 
conic, 549, 550 
conjugacy class, 60, 125, 417 
conjugate 
character, 421 
complex number, 41 
element of quadratic field, 140 
group elements, 60, 417 
ideal, 146 
octonion, 54 
quadratic irrational, 192, 209-210 
quaternion, 49 
representation, 421 
connected set, 28, 44 
constant, 386 
coefficient, 97 
sequence, 24, 269 
contains, 2 
continued fraction 
algorithm, 179, 181, 211-212, 217 
expansion of Laurent series, 212, 
219 
expansion of real number, 179-182 
map, 479 
continued fractions in higher dimen- 
sions, 217 
continued fractions, texts, 217 
continuous function (map), 26, 28, 29, 
43, 65 
continuously differentiable, 34, 36, 65, 
67 
contraction principle, 32-33, 35, 36, 
76, 285 
contragredient representation, 411 
convergence 
in measure, 29 
of compact sets, 353 
of lattices, 352-357 
convergent of Laurent series, 212 
convergent of real number, 181, 182, 
185-188, 480 
convergent sequence, 24-26, 30, 32, 
33, 269 


595 


convex set, 327, 341 
convolution product, 152, 433 
Conway’s groups, 351 
coordinate, 3, 41, 47 
coprime, 85 
coset, 58, 63-64, 107 
representative, 58, 78 
right, left, 58 
countably infinite, 10 
covering, 337, 359 
critical 
determinant, 339, 348, 357 
lattice, 339, 357 
cross-polytope, 339 
cross-ratio, 500 
crystal, 346, 358 
crystallographic group, 346-347, 358 
crystallography, 358 
cube, 339 
cubic 
curve, 549-558 
polynomial, 39, 76 
cusp, 554, 571, 572 
form, 573 
cut, 18-21 
cyclic group, 59, 62, 110, 114-116, 
198, 251, 263, 423 
cyclotomic 
field, 151 
polynomial, 102, 111, 112, 135, 426 
cylinder set 
general, 478 
special, 476 


decimal expansion, 18, 107, 476, 509 
decomposable lattices, 348, 349 
Dedekind 

construction of reals, 18—22, 76 

eta function, 549 

zeta function, 384-385, 395, 409, 

573 

Dedekind—Peano axioms, 5, 76 
deduced representation, 420 
degree of 

affine curve, 549 

algebraic number, 214 

extension field, 67 
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polynomial, 96-97 
representation, 410, 414 
De Morgan’s laws, 3 
dense, 17-18, 275, 447 
sequence, 448, 452 
subset of metric space, 31, 270 
densest 
lattice, 348, 350-351, 359, 362 
packing, 359-360 
density of lattice packing, 348 
derivative, 33-36, 67, 103 
designs, 247-251, 254-258 
determinant, 53, 180, 181, 224-229, 
256 
of lattice, 333, 347, 356 
of quadratic space, 293 
diagonal matrix, 225 
diagonalization of quadratic form, 237— 
239, 294, 297 
difference of sets, 3 
differentiable map, 33-34, 43, 75 
differential form, 256 
dimension of vector space, 67 
Diophantine 
approximation, 185, 212, 217, 219, 
329, 358 
equation, 161, 178, 195-201, 215- 
217, 219 
direct product of groups, 53, 60, 117, 
172, 401, 429, 443 
direct sum of 
rings, 64, 117 
vector spaces, 65 
Dirichlet 
L-function, 404—409, 443 
character, 403-404, 574 
domain, 342 
product, 152-156, 175, 389 
series, 175, 389,572 
Dirichlet’s 
class number formula, 218 
convergence criterion, 138 
Dirichlet’s theorem on 
Diophantine approximation, 329 
primes in arithmetic progression, 218, 
313, 316, 400, 443 


Index 


units in number fields, 176 
discrepancy, 459-464, 488 
discrete 

absolute value, 275, 276 

group, 203, 330 

set, 342 

subgroup, 330, 490 
discriminant of 

binary quadratic form, 205-207 

elliptic curve, 553, 570, 579, 580 

lattice, 333 

quadratic irrational, 191 
disjoint sets, 2 
distance, 18, 27-31 
distributive 

lattice, 85 

law, 7, 60, 85 
divisibility tests, 107 
divisible, 83, 87, 146 
division 

algebra, 54, 78 

algorithm, 14, 90, 98 

ring, 62—70, 125 
divisor, 83, 386 

of zero, 14, 62 
doubly-periodic function, 514-517, 

527, 538 
dual 

2-design, 249 

convex body, 340 

group, 401, 434, 443 

lattice, 334, 340, 538 
duality theorem, 435 
dynamical system, 30, 388, 395, 447 
Dynkin diagrams, 359 


e, 38, 187 
echelon form, 164 
eigenvalue, 239, 257, 430 
eigenvector, 239 
Eisenstein 
integers, 141-143 
irreducibility criterion, 102, 124 
element, 2 
elementary group, 425 
ellipse, 237, 493-494 
ellipsoid, surface area of, 495-496 


Index 


elliptic 
curves, 555-558, 569-574, 582-583 
functions, 509-516, 525-530 
elliptic integral, 496-503, 536-537 
of first kind, 498, 505, 514-516, 537 
of second kind, 498, 505, 516-517, 
530, 537 
of third kind, 498, 506, 517, 530, 
537 
empty set, 2, 61 
endomorphism of torus, 473, 483 
energy surface, 465, 481 
entropy, 482-483 
equal, 2 
equivalence class, 4 
equivalence of 
absolute values, 266 
complex numbers, 184, 191,201,210 
fundamental sequences, 26, 31 
Hadamard matrices, 252 
ideals, 151 
matrices, 171 
quadratic forms, 293, 297-298, 311, 
317, 325 
representation, 410, 416 
equivalence relation, 4, 11, 26,58, 106, 
184 
Erd6és—Turan inequality, 462, 489 
ergodic, 465 
hypothesis, 464-465, 470 
measure-preserving transformation, 
470, 472-473, 478-479 
theorems, 466, 489 
ergodic theory, texts, 489 
error-correcting code, 28, 256, 258 
Euclidean 
algorithm, 91—92, 98, 104, 109, 121, 
181, 560 
distance, 28, 72, 346 
domain, 104—106, 119-120, 124, 142, 
171 
metric, 342 
norm, 229, 234, 342 
prime number theorem, 134, 363 
Euler’s 
angles, 439 
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conjecture, 580-581, 584 
constant, 159, 380 
criterion for quadratic residues, 112, 
135, 136, 159 
formulas for cos z and sin z, 45, 77 
parametrization of rotations, 51-53 
pentagonal number theorem, 545, 581 
phi-function, 109-110, 114, 154-157, 
389, 399, 403 
prime number theorem, 363-364 
product formula, 371, 381, 405 
theorem on homogeneous functions, 
551 
even 
lattice, 349, 351 
permutation, 57, 130, 224, 227 
eventually periodic, 18, 277 
continued fraction, 192 
exceptional simple Lie algebra, 442, 
444 
existence theorem for ordinary differ- 
ential equations, 36-38, 76, 510 
exponential 
function, 38-39, 45—47 
series, 38, 45 
sums, 388, 489 
extended Riemann hypothesis, 385, 


395, 409 

extension of absolute value, 273, 
283-284, 290 

extension of field, 45, 67, 283-284, 
288, 386 


exterior algebra, 256 
extreme point, 236 


face, 343 
facet, 343-345 
vector, 344-346 
factor, 83, 146 
group, 58-59 
factorial domain, 90, 101, 124, 151, 
153, 175 
factorization, 124, 127 
Faltings’ theorem, 215, 575, 583 
Fano plane, 248, 250 
Fermat equation, 142, 151, 575, 576, 
579 
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Fermat number, 160-161, 175 
Fermat prime, 160-161 
Fermat’s 
last theorem, 151-152, 175, 579- 
580, 584 
little theorem, 110-112, 116 
Fibonacci numbers, 95 
field, 23, 40, 62-63, 109, 125 
of fractions, 88, 100, 261 
field theory, texts, 79 
finite 
dimensional, 66, 67 
field, 109, 125, 232, 289, 298, 386— 
387, 570-571 
field extension, 45, 67, 283-284, 288, 
386 
group, 57 
set, 10 
finitely generated, 63, 66, 95, 168, 172, 
176, 559 
Fischer’s inequality, 239, 243 
fixed point, 32, 35, 37 
theorems, 32, 76 
flex, 551-553, 557 
flow, 465, 481, 483 
formal 
derivative, 103, 112, 278 
Laurent series, 211, 219, 263, 271, 
289, 358 
power series, 96, 124, 275, 287 
Fourier 
integrals, 435-436, 443 
inversion formula, 435-436 
series, 79, 138, 378, 436, 443, 450 
transform, 374, 378, 435-436 
fraction, 15 
fractional part, 448, 479 
free 
action, 218 
product, 203 
subgroup, 172, 569 
submodule, 172 
Fresnel integral, 139 
Frobenius 
complement, 429 
conjectures, 124, 211 
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group, 429, 443 

kernel, 429 

reciprocity theorem, 420 

theorem on division rings, 69 
Fuchsian group, 218 
function, 4; 386 
function fields, 386-388, 395 

and coding theory, 388, 395 
functional equation of 

L-functions, 409, 443, 572 

zeta functions, 380-381, 385, 387, 

443 

fundamental 

domain, 203, 333, 346, 499 

sequence, 25, 26, 30-33, 269, 270, 

274 

solution, 197-201 
fundamental theorem of 

algebra, 42-45, 69, 77, 98, 413 

arithmetic, 88-90, 124, 145 
Furstenberg’s theorems, 484-487, 490 
Furstenberg—Katznelson theorem, 485 
Furstenberg—Weiss theorem, 485 


Galois theory, 79, 160 
gamma function, 328, 379-380, 394, 
572 
Gauss 
class number problem, 218, 578, 584 
invariant measure, 479 
map, 479-480, 483, 489 
sum, 135-140, 174 
Gauss—Kuz’ min theorem, 480, 489 
Gaussian integer, 119-120, 141, 145, 
542 
Gaussian unitary ensemble, 384 
GCD domain, 87—90, 98, 100-101 
gear ratios, 187 
Gelfand—Raikov theorem, 434 
general linear group, 68, 251, 440 
generalized 
character, 428 
trigonometric polynomial, 75, 458 
upper half-plane, 218-219 
generated by, 59, 60, 63, 66, 91, 114, 
161, 331, 465 
generating function, 544, 581 
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matrix, 255 
of cyclic group, 114 
genus of 
algebraic curve, 216, 395, 558 
field of algebraic functions, 387, 388 
geodesic, 208-210, 219, 388, 395, 482 
flow, 388, 395, 481, 483 
geometric 
representation of complex numbers, 
41,47 
series, 34 
geometry of numbers, texts, 357 
Golay code, 255, 256 
golden ratio, 179 
good lattice point, 464 
Good—Churchhouse conjectures, 
391-392, 395 
graph, 30 
Grassmann algebra, 256 
greatest 
common divisors, 83-87, 89-92, 148 
common left divisor, 167 
common right divisor, 121 
lower bound, 19, 22, 24 
group, 55-60, 109 
generated by reflections, 359, 442, 
444 
law on cubic curve, 555-556, 565, 
582, 583 
group theory, texts, 78 


Haar 
integral, 432-433, 443 
measure, 358, 435, 474 
Hadamard 
design, 250-251 
determinant problem, 223, 243-247, 
250, 257 
inequality, 223, 229-230 
matrix, 223, 230-233, 250-257, 306, 
321, 350 
Hales—Jewett theorem, 488, 490 
Hall’s theorem on solvable groups, 428, 
443 
Hamiltonian system, 481, 483 
Hamming distance, 28, 255 
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Hardy—Ramanujan expansion, 549, 
582 
Hasse 
invariant, 310, 311, 324 
principle, strong and weak, 317, 325 
Hasse—Minkowski theorem, 312-316, 
324-325 
Hasse—Weil (HW) conjecture, 572 
Hausdorff 
distance, 352-353 
maximality theorem, 485 
metric, 353, 356, 357 
heat conduction equation, 519 
height of a point, 559-561 
Hensel’s lemma, 277—282, 290 
Hermite 
constant, 347-348, 351, 359 
normal form, 175, 332 
highest coefficient, 97 
Hilbert field, 306-311, 324 
Hilbert space, 75, 79, 433 
Hilbert symbol, 303-307, 313, 324 
Hilbert’s problems 
Sth, 440, 443 
Oth, 174 
10th, 217 
17th, 323-324, 325 
18th, 358, 359-360 
AH -matrix, 230-231, 244, 245 
holomorphic function, 48, 124, 372, 
377, 404, 405, 512 
homogeneous linear equations, 68, 166 
homomorphism of 
groups, 52-53, 58-59, 566 
Lie algebras, 441 
Lie groups, 441 
rings, 63-64, 99, 111 
vector spaces, 67 
Horner’s rule, 99 
Hurwitz integer, 120-121, 126, 541 
hyperbolic 
area, 209 
geometry, 208-209, 219 
length, 208 
plane, 298, 299, 302 
hypercomplex number, 77 
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hypergeometric function, 537 
hyperreal number, 76 


ideal, 63, 90-91, 145, 148 
class group, 151 
in quadratic field, 146-151, 207 
of Lie algebra, 441 
identity 
element, 12, 20, 55—56, 60, 65, 83 
map, 4 
Ikehara’s theorem, 367, 373, 385, 389, 
390, 408 
image, 4 
imaginary 
part, 40 
quadratic field, 140 
incidence matrix, 247-249 
included, 2 
indecomposable lattice, 348-349 
indefinite quadratic form, 205 
indeterminate, 96 
index of 
quadratic space, 296, 297 
subgroup, 58, 60, 419, 423-424, 568 
indicator function, 327, 358, 449, 
470 
individual ergodic theorem, 466 
induced representation, 419-423, 425 
induction, 9 
infimum, 19, 22 
infinite order, 59 
inflection point, 551 
inhomogeneous Lorentz group, 442 
injection, 4 
injective map, 4, 9, 68 
inner product space, 71-75, 79, 433 
integer, 10-15, 17 
of quadratic field, 106, 141 
integrable, 466 
in sense of Lebesgue, 32, 75, 327, 
435 
in sense of Riemann, 327, 358, 449, 
450 
integral 
divisor, 386 
domain, 62, 87, 96 
equations, 74, 79, 223 


Index 


lattice, 349, 538 
representation for /’-function, 380, 
573 
interior, 28, 333, 337, 343 
intersection 
of modules, 166 
of sets, 2-3, 61 
of subspaces, 65 
interval, 26, 29, 65 
invariant 
factor, 171 
mean, 437 
region, 481 
subgroup, 58 
subset, 485 
subspace, 411 
inverse, 12, 16, 55, 62, 153 
class, 417 
element, 55—56, 62, 556, 583 
function theorem, 34—36, 76 
map, 4-5 
inversion 
of elliptic integral, 514,516 
of order, 57, 129 
invertible 
element of ring, 62 
matrix of integers, 162 
measure-preserving transformation, 
466, 479 
involutory automorphism, 41, 140 
irrational number, 22, 179, 448, 451 
irrationality of KD: 18, 99 
irreducible 
character, 414, 416-418, 437 
curve, 551, 552-553 
element, 89-90, 145 
ideal, 148 
polynomial, 98, 102, 111-112 
representation, 411-418, 434, 437 
irredundant representation, 344 
isometric 
metric spaces, 31 
quadratic spaces, 299 
isometry, 31, 75, 208, 299-303, 346, 430 
isomorphism, 5, 14, 17, 22, 23 
of groups, 59 
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of measure-preserving transformations, 
482-483 
of rings, 64 
of vector spaces, 68 
isotropic 
subspace, 295 
vector, 295 


Jacobi symbol, 130-134, 139, 174 
Jacobi’s 
imaginary transformation, 377, 520 
triple product formula, 518, 536, 545, 
548, 582 
Jacobian elliptic functions, 525-530 
join, 2 
Jordan—Holder theorem, 124 


Kepler conjecture, 359-360 
kernel of 
group homomorphism, 53, 59 
linear map, 68 
representation, 426 
ring homomorphism, 63 
Kervaire—Milnor theorem, 78 
Kingman’s ergodic theorem, 489 
K-point 
affine, 549 
projective, 550, 555 
kissing number, 351, 359, 362 
Kronecker 
approximation theorem, 452, 488 
delta, 415 
field extension theorem, 45 
product, 68, 231, 233, 250, 255, 411 


Lagrange’s theorem 

on four squares, 120-122, 218, 541 

on order of subgroup, 58, 110, 114 
Landau order symbols, 194, 365, 590 
Landau’s theorem, 405—406 
Landen’s transformation, 529, 536, 566 
Langlands program, 174, 178, 582 
Laplace transform, 373, 394, 405 
lattice, 85, 123; 141, 332-334, 348 

in locally compact group, 490 

packing, 348, 359 

packing of balls, 348, 350, 359 
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point, 328, 332 
translates, 337-338 
Laurent 
polynomial, 98, 216 
series, 48, 211, 263, 287, 289, 358, 
373 
law of 
iterated logarithm, 391, 395, 489 
Pythagoras, 17, 73-74, 108 
quadratic reciprocity, 129, 133-136, 
151, 174, 314 
trichotomy, 8, 13, 21 
least 
common multiple, 84-87, 89 
common right multiple, 167 
element, 8 
non-negative residue, 107, 115 
upper bound, 19, 22, 24 
least upper bound property (P4), 19, 
22, 26 
Lebesgue measurable, 32, 75 
Lebesgue measure, 32, 327, 465, 472, 
473, 480 
Leech lattice, 350-351, 359, 362 
Lefschetz fixed point theorem, 76 
left 
Bézout identity, 121 
coprime matrices, 167—168 
coset, 58 
divisor, 167 
Legendre 
interchange property, 517 
normal form, 498 
polynomials, 74 
relation, 508 
symbol, 129, 133, 135, 149, 232, 
305 
theorem on ternary quadratic forms, 
312 
lemniscate, 495, 509, 536, 537 
less than, 8 
L-function, 404, 443, 571-574, 578 
Lie 
algebra, 440-442, 443-444 
group, 439-442, 443-444 
subalgebra, 440 
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subgroup, 441 
limit, 24, 30, 269 
linear 
code, 255, 388 
combination, 65 
differential system, 172-173 
Diophantine equation, 91, 161, 
165-166 
fractional transformation, 179, 208, 
500, 531, 535 
map, 67 
systems theory, 176 
transformation, 67 
linear algebra, texts, 79 
linearly dependent, 66 
linearly independent, 66 
Linnik’s theorem, 409 
Liouville’s 
integration theory, 537 
theorem in complex analysis, 77, 517 
theorem in mechanics, 481 
Lipschitz condition, 460 
Littlewood’s theorem, 383, 395 
LLL-algorithm, 358 
L?-norm, 72 
local-global principle, 317-318, 325 
locally compact, 28 
group, 358, 432-436, 444, 490 
topological space, 432 
valued field, 284—290 
locally Euclidean topological space, 440 
logarithm, 39, 364 
lower 
bound, 14, 19, 22 
limit, 24 
triangular matrix, 229 
Lucas—Lehmer test, 158, 175 


Mahler’s compactness theorem, 357, 
360, 362 
map, 4 
mapping, 4 
Markov 
spectrum, 210, 219 
triple, 210-211, 222 
matriage theorem, 78 
Maschke’s theorem, 411, 431 
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Mathieu groups, 251, 254, 255 
‘matrix’, 162, 168 
matrix theory, texts, 79, 257 
maximal ideal, 64, 148, 269, 274 
maximal totally isotropic subspaces, 296 
Mazur’s theorem, 570 
mean motion, 458, 489 
measurable function, 29, 466 
measure theory, texts, 489 
measure zero, 29, 32, 482 
measure-preserving transformation, 
466-473, 477-484 
meet, 2 
Mellin transform, 573 
Méray—Cantor construction of reals, 18, 
26 
Merkur’ ev’s theorem, 324 
meromorphic function, 48, 263, 512, 
538 
Mersenne prime, 158-159, 175 
Mertens’ theorem, 364 
method of successive approximations, 
32, 36, 38 
metric space, 27-32, 72, 255, 268 
Meyer’s theorem, 313, 316 
minimal 
basis, 173 
model, 571, 580, 583 
polynomial, 283 
vector, 345, 346 
minimum of a lattice, 345-347, 356, 
357 
Minkowski’s theorem on 
discriminant, 330, 358 
lattice points, 328-330, 338-339, 358 
linear forms, 328 
successive minima, 339-341, 358 
minor, 171 
mixing transformation, 480, 483 
Mobius 
function, 156, 390-392, 395 
inversion formula, 156-157, 175 
modular 
elliptic curve, 573-574, 578, 584, 
586 
form, 258, 544, 573-574, 578, 583 
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function, 531-536, 538 

group, 202-205, 531 

transformation, 201 
module, 161, 166-169, 171-172, 175 
modulo m, 106 
monic polynomial, 97, 151, 174, 262 
monotonic sequence, 24—26 
Monster sporadic group, 258 
Montgomery’s conjecture, 383-384, 

395 

Mordell conjecture, 215, 216,575, 583 
Mordell’s theorem, 176, 559, 565-569 
multiple, 83 
multiplication, 60 

by a scalar, 64 

of integers, 12 

of natural numbers, 7 

of rational numbers, 16 
multiplicative 

function, 154-155, 175 

group, 62, 114-115, 125, 292 

inverse, 16 


Nagell—Lutz theorem, 569, 578 
natural 

logarithm, 39, 364 

number, 5—10, 14 
nearest neighbour conjecture, 384 
negative 

definite quadratic space, 296 

index, 297 

integer, 13 
neighbourhood, 33 
Nevanlinna theory, 215, 219 
Newton’s method, 277, 290 
node, 554, 571, 572 
non-archimedean absolute value, 261, 

264, 273-276 

non-associative, 53-54, 78 
non-Euclidean 

geometry, 208-209, 219 

line, 208 

triangle, 209, 533 
non-negative linear functional, 432 
nondecreasing sequence, 24, 25 
nondegenerate lattice, 332 
nonincreasing sequence, 24, 25 
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non-singular 
cubic curve, 555 
linear transformation, 68 
matrix, 227 
point, 549, 550 
projective curve, 387 
projective variety, 387 
quadratic subspace, 293 
norm of 
n-tuple, 28 
complex number, 119 
continuous function, 28, 29 
element of quadratic field, 106, 
140-141 
ideal, 384 
integral divisor, 386 
linear map, 34 
octonion, 54 
prime divisor, 386 
quaternion, 50-51, 121 
vector, 71, 271, 341 
norm-Euclidean domain, 106 
normal 
form for cubic curve, 552-553, 
556-558 
frequencies, 430 
modes of oscillation, 430 
number, 474, 476, 489 
subgroup, 58-59, 79, 421, 427 
vector, 474-476, 489 
normed vector space, 271, 287, 341 
n-th root of 
complex number, 44, 47, 77 
positive real number, 23 
n-tuple, 3, 28, 64 
nullity of linear map, 68 
nullspace of linear map, 68 
number theory, texts, 123 
numbers, 1, 74 
numerical integration, 460-464, 489 


octave, 53 

octonion, 53-55, 77, 82, 442 

odd permutation, 57, 129, 130, 224, 
227 

one (1), 5, 60 

one-to-one, 4 
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correspondence, 4 
open 
ball, 28, 33, 342 
set, 28, 43 
operations research, 78 
Oppenheim’s conjecture, 324, 325 
order in natural numbers, 8 
order of 
element, 59, 113, 114 
group, 57, 109, 113, 114 
Hadamard matrix, 230 
pole, 48 
projective plane, 248 
ordered field, 23, 26, 41, 76, 79, 280, 
296-297, 308 
ordinary differential equation, 36-38, 
76, 510 
orientation, 225, 347 
Ornstein’s theorem, 483 
orthogonal 
basis, 74, 294, 335 
complement, 293 
group, 440 
matrix, 52, 238 
set, 73 
sum, 293, 349 
vectors, 73, 293 
orthogonality relations, 402, 415-416, 
418, 437 
orthonormal set, 73—75 
Oseledets ergodic theorem, 489 
Ostrowsk1’s theorems, 266, 284, 290, 
313 


packing, 337, 359-360 
p-adic 
absolute value, 262 
integer, 275, 277, 287 
number, 18, 271, 275, 277, 287, 288, 
310, 313, 436 
pair correlation conjecture, 383-384, 
395 
Paley’s construction, 23 1-233, 255, 350 
parallelogram law, 73, 79, 561, 564 
parallelotope, 229, 333 
parametrization, 51-52, 217, 219, 
554-555, 558, 574 
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Parseval’s equality, 74-75, 335-336 
partial 
fractions, 497 
order, 85 
quotient, 181, 190, 212, 480 
partition of 
positive integer, 544-549 
set, 4, 58 
partition theory, texts, 581 
Pascal triangle, 94 
path-connected, 44, 437, 441 
Peano axioms, 5, 76 
Pell equation, 144, 196-201, 217 
for polynomials, 213, 219 
pendulum, period of, 494 
Pépin test, 160-161 
percolation processes, 489 
perfect number, 157-159, 175 
period of continued fraction, 192-194, 
197-199, 217 
periodicity of 
continued fraction, 192-194, 209, 217 
elliptic functions, 513-516, 527, 538 
exponential function, 46-47 
permutation, 57, 129-131, 227 
perpendicular, 73 
Perron—Frobenius theorem, 480 
Pfister’s multiplicative forms, 323 
pi (z), 46, 48, 186, 217, 222, 364, 509 
Picard’s theorem, 538 
pigeonhole principle, 10, 57 
Plancherel theorem, 435 
Poincaré 
model, 208, 219 
recurrence theorem, 483-484 
point, 247, 250, 549, 550 
at infinity, 550, 552, 553 
pointwise ergodic theorem, 466 
Poisson summation, 138, 378, 394, 435, 
538 
polar 
coordinates, 47, 495 
lattice, 334 
pole of order n, 48 
poles of elliptic functions, 526-527 
polynomial, 96-103 
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part, 211 
ring, 87, 104 
polytope, 343, 358 
Pontryagin—van Kampen theorem, 435 
positive 
index, 297 
integer, 13-14 
measure, 433 
rational number, 16-17 
real number, 18, 22 
semi-definite matrix, 235, 239 
positive definite 
matrix, 235, 239 
quadratic form, 205 
quadratic space, 296 
rational function, 323-324 
power series, 45, 46, 48 
primality testing, 124, 127 
prime 
divisor, 386 
element, 89, 145 
ideal, 148-151, 384-385 
ideal theorem, 385, 393, 395 
number, 88-89 
prime number theorem, 365-367, 369— 
377, 390, 394-395 
for arithmetic progressions, 394, 400, 
403-408 
primitive 
Dirichlet character, 409 
polynomial, 100 
quadratic form, 206 
root, 115-116, 124-125, 385 
root of unity, 111, 112, 114 
principal axes transformation, 238, 257 
principal character, 401 
principal ideal, 91, 145 
domain, 92, 95—96, 98, 104, 
105-106, 168-172 
principle of the argument, 533 
probability 
measure, 466 
space, 466 
theory, 29, 75, 76, 391, 395, 582 
problem of moments, 220 
problem, 3x + 1, 490 
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product 
formula for theta functions, 519,521 
formula for valuations, 267 
measure, 476 
of ideals, 145 
of integers, 12 
of linear maps, 67 
of natural numbers, 7 
of rational numbers, 16 
of representations, 411 
of sets, 3 
projective 
completion, 550-551, 552-553 
conic, 550 
cubic, 550 
equivalence, 551-553 
line, 550 
plane, 248, 250, 321, 325 
plane curve, 550 
space, 53 
proper 
divisor, 89 
subset, 2 
properly equivalent 
complex numbers, 184, 201 
quadratic forms, 205—206 
properly isomorphic, 347 
public-key cryptography, 124 
Puiseux expansion, 44 
pure 
imaginary complex number, 41 
quaternion, 51, 52 
Pythagoras’ theorem (or law), 17, 
73-74, 108 
Pythagorean triple, 108, 217 


q-binomial coefficient, 95, 581 
q-difference equation, 546 
q-hypergeometric series, 581 
q-integral, 581 
quadratic 
field, 105-106, 124, 140-151, 174, 
207, 218 
form, 205-207, 291-293, 563 
irrational, 191-194, 206, 209-210, 
214, 537 
nature, 129, 133 
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non-residue, 112-113, 121, 129, 
132-133, 385 
polynomial, 42, 283 
residue, 112-113, 121, 129, 
132-133, 280 
space, 292-303, 324 
quadratic spaces, texts, 324 
quantum group, 582 
quartic polynomial, 76-77 
quasicrystal, 359 
quasiperiodic tiling, 359 
quaternion, 48-53, 69, 77-82, 120-122, 
541 
quaternionic analysis, 77 
Quillen—Suslin theorem, 176 
quotient, 15, 48, 90 
group, 58 
ring, 63, 107, 384 
space, 209, 210, 218 


Radstrom’s cancellation law, 353, 360 
Ramanujan’s tau-function, 388, 395 
random matrices, 384, 395, 489 
range of linear map, 68 
rank of 
elliptic curve, 569-570, 572 
linear map, 68 
rational 
function, 88, 212, 262, 386 
number, 15-17, 181-182, 277 
transformation, 558 
real 
analysis, 26, 76 
number, 22—26 
part, 40 
quadratic field, 140 
reciprocal lattice, 334 
reciprocity for Gauss sums, 137 
recurrence for number of partitions, 545 
recursion theorem, 5—6 
reduced 
automorphism group, 252 
lattice basis, 358 
quadratic form, 206, 207 
quadratic irrational, 192-194 
reducible 
curve, 551 
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polynomial, 282 
representation, 411 
Reed—Muller code, 255-256 
refinement theorems, 86, 123-124 
reflection, 53, 300 
reflexive relation, 4 
regular prime, 151 
regular representation, 410, 416-417, 
434, 437 
relatively 
dense set, 343 
prime, 85, 167 
relevant vector, 344 
remainder, 90 
theorem, 99 
replacement laws, 106 
representation of 
compact group, 436—439 
finite group, 410-413, 437, 443 
group, 410, 442-444 
locally compact group, 434-436 
representative of 
coset, 58, 78 
residue field, 276 
representatives, distinct, 78 
represented by quadratic form, 294, 295 
residue, 48, 371, 390 
class, 107, 400 
field, 274-276, 386 
resolution of singularities, 558 
restriction of map, 4 
Ribet’s theorem, 580 
Riemann 
integrable, 327, 358, 449, 450, 452 
normal form, 498—502, 510, 
555-556 
surface, 218 
zeta function, 366, 370-373, 
380-384, 390-392, 394 
Riemann hypothesis, 38 1-383, 391-392, 
395, 398 
for algebraic varieties, 388, 395 
for elliptic curves, 571, 574, 583 
for function fields, 387—388, 395, 
583 
Riemann—Lebesgue lemma, 376 
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Riemann-Roch theorem, 341, 358, 387 
Riemannian manifold, 388, 481-482, 
483 
Riesz representation theorem, 433 
Riesz—Fischer theorem, 75 
right 
coset, 58-60 
multiple, 167 
vector spaces, 68 
ring, 60-64, 68, 96, 107 
ring theory, texts, 78 
Rogers—Ramanujan identities, 546-549, 
582 
root, 99-100, 219, 277 
lattice, 348-351, 359 
Roth’s theorem on algebraic numbers, 
123, 214-216, 219 
ruler and compass constructions, 160, 
175 


scalar, 64 
schemes, 582 
Schmidt’s 
discrepancy theorem, 463-464, 489 
subspace theorem, 214-215, 219 
Schmidt’s orthogonalization process, 74, 
230 
Schreier’s refinement theorem, 123-124 
Schur’s lemma, 413, 415, 417, 418 
Schwarz’s inequality, 28, 72, 234, 373, 
453 
self-dual lattice, 334, 335, 340 
semi-simple 
Lie algebra, 441-442, 444 
Lie group, 441-442, 444 
semi-stable elliptic curve, 572, 574, 580 
semidirect product, 428 
semigroup, 76 
Serre’s 
é-conjecture, 580 
conjecture, 175-176 
set, 2-4, 61 
set of representatives, 276, 289 
shift map, 477, 479, 487 
Siegel’s 
formula, 335 
lemma, 340, 358 
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modular group, 218-219 
theorem on Diophanite equations, 216, 
217, 219, 583 
sigma algebra, 433, 465 
sign of a permutation, 57, 130-131, 
227 
signed permutation matrix, 242, 246, 
252 
simple 
associative algebra, 69 
basis, 349-350 
group, 58, 251, 258, 427, 428 
Lie algebra, 441-442, 444 
Lie group, 77, 251 
pole, 48 
ring, 63 
simply-connected, 53, 437 
covering space, 53, 441-442 
Lie group, 441-442 
simultaneous diagonalization, 239, 429 
singular matrix, 227, 228 
small 
divisor problems, 219-220 
oscillations, 257, 429-430 
Smith normal form, 169-173, 176 
sojourn time, 470 
solvable 
by radicals, 39, 79 
group, 428 
Lie algebra, 443-444 
spanned by, 66 
special, 53 
linear group, 202, 229 
orthogonal group, 53, 438-439, 442 
unitary group, 53, 437-438, 442 
spectrometry, 236, 257 
spherical trigonometry, 538 
sporadic simple group, 251, 258, 351 
square, 14 
class, 292, 293, 565-566 
design, 249, 258, 321 
square 2-design, 249, 258, 321 
square root of 
complex number, 39, 42 
positive real numbers, 22—23, 24-25 
square-free 
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element, 90 
integer, 90, 140 
polynomial, 103, 112 
square-norm, 342, 345 
star discrepancy, 459, 464 
Steiner system, 250 
step-function, 449 
Stieltjes integral, 220, 368, 394 
Stirling’s formula, 328, 380 
Stone’s representation theorem, 62, 76 
Stone—Weierstrass theorem, 488 
strictly proper part, 211 
strong 
Hasse principle, 317-318 
triangle inequality, 30, 32, 262 
structure theorem 
for abelian groups, 172, 569 
for modules, 172 
subadditive ergodic theorem, 489 
subgroup, 56-57 
subset, 2, 61 
subspace, 65-68 
successive 
approximations, 32, 36, 38 
minima, 339-341, 358 
successor, 5 
sum of 
linear maps, 68 
modules, 166-167 
natural numbers, 6—7 
points of elliptic curve, 556, 583 
representations, 411, 412 
subspaces, 65 
sum of squares, 51, 55, 78, 126, 544, 
581 
four, 51, 120-122, 218, 541-542 
three, 108, 120, 318-319 
two, 108, 119-120, 199-201, 218, 
247, 542-544 
for polynomials, 322-323, 325 
for rational function, 323-324, 325 
supplements to law of quadratic reci- 
procity, 133, 314 
supremum, 19, 22 
surface 
area of ellipsoid, 495-496 
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of negative curvature, 388-389, 395, 
482, 483 
surjection, 4 
surjective map, 4, 10, 68 
Sylvester’s law of inertia, 297 
symmetric 
difference, 61, 465 
group, 57, 227, 423-425 
matrix, 232, 238-239, 292-294, 301 
relation, 4 
Riemannian space, 219 
set, 328, 341 
symmetric 2-design, 249 
symmetry group, 346, 430 
symmetry operation, 430 
symplectic matrix, 219, 440, 442 
systems of distinct representatives, 78 
Szemeredi’s theorem, 484—485, 490 


tangent 
space, 440, 481-482 
to affine curve, 550 
to projective curve, 550 
taxicab number, 117 
Taylor series, 48, 373, 406 
t-design, 250, 254—255 
tensor product, 68, 303 
theta functions, 379, 519-525, 530, 535, 
541-544 
of lattice, 538 
tiling, 204, 337, 343, 344, 346, 358, 
359, 499 
topological 
entropy, 388 
field, 268 
group, 432, 440, 442, 443 
topology, 28, 268 
torsion 
group of elliptic curve, 569-570, 577 
subgroup, 59, 172 
submodule, 172 
torsion-free, 346, 358 
torus, n-dimensional, 439, 441, 472-473 
total 
order, 8, 18, 24 
variation, 461-462, 489 
totally isotropic subspace, 295-296, 299 
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totient function, 110 
trace of 
matrix, 414 
quaternion, 49-50 
transcendental element, 386 
transcendental number, 174, 578, 583 
transformation formulas 
for elliptic functions, 512, 528-529, 
538 
for theta functions, 377-379, 520, 
522 
transitive 
law, 8 
relation, 4 
translation, 346 
of torus, 472, 483 
transpose of a matrix, 227, 229 
triangle inequality, 27, 262 
triangular matrix, 229 
trichotomy law, 8 
trigonometric 
functions, 45—47, 77 
polynomial, 450, 488 
trivial 
absolute value, 262, 273 
character, 401 
representation, 410 
ring, 62 
TW-conjecture, 574, 578, 580, 584, 586 
twin prime, 392-393, 395 
twisted L-functions, 574 
2-design, 247—250, 258 
type (A) Hilbert field, 308-310 
type (B) Hilbert field, 308-311 


ultrametric inequality, 262, 273, 277 
uniform distribution, texts, 488 
uniformization theorem, 218 
uniformly distributed mod 1, 448-458 
union of sets, 2—3, 61 
unique factorization domain, 90 
unit, 62, 87, 108, 144-145, 153 
unit circle, 47-48 
unit tangent bundle, 481-482 
unitary 

group, 440, 442 

matrix, 53, 437 
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representation, 412, 434, 437 

symplectic group, 440, 442 
universal quadratic form, 295, 297 
upper 

bound, 19, 22 

density, 484 

half-plane, 201, 208-209, 519, 531, 

534 
limit, 24 
triangular matrix, 229 


valuation 
ideal, 274-276 
ring, 88, 274-276 
valuation theory, texts, 290 
value, 4 
group, 262-263, 274, 276 
valued field, 261-264 
van der Corput’s 
difference theorem, 454, 458 
sequence, 463, 464 
van der Waerden’s theorem, 485—488, 
490 
vector, 64 
space, 64-70 
vertex of polytope, 343 
volume, 327 
von Mangoldt function, 370, 372 
Voronoi cell, 342—346, 349, 359 
of lattice, 344-346, 353-357, 359 
Voronoi diagram, 358 


Waring’s problem, 122-123, 126 
weak Hasse principle, 317 
Wedderburn’s theorem on 

finite division rings, 125 

simple algebras, 69 
Weierstrass approximation theorem, 74, 

450, 488 

weighing, 233-236, 257 

matrix, 234—236, 257 
weight of vector, 254—256 
Weil conjectures, 387-388, 395 
Weyl’s criterion, 451 
Wiener’s Tauberian theorem, 367, 394 
Wiles’ theorem, 575, 580, 584 
Williamson type, 233 
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Wilson’s theorem, 112, 122 

Witt 
cancellation theorem, 302, 303, 317 
chain equivalence theorem, 310 
equivalence, 303 
extension theorem, 302 
ring, 303, 323, 324 


zero, 12, 60 
zeros of elliptic functions, 526-527 
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zeta function, 366, 370-373, 380-384, 
394, 398 
generalizations, 389, 395 
of function field, 386-387 
of number field, 384—385, 395, 409 


